SlideShare a Scribd company logo
1 of 58
Download to read offline
Analysis of Advanced
      Aggregation Techniques
      for Software Metrics
                Final presentation

      Bogdan Vasilescu
      b.n.vasilescu@student.tue.nl
      Supervisor: Dr. Alexander Serebrenik




July 20, 2011
Analysis of advanced       aggregation techniques for software metrics   2/32




     Most metrics do not have a definition at system level.




/   department of mathematics and computer science
Analysis of advanced       aggregation techniques for software metrics   2/32




     Most metrics do not have a definition at system level.




/   department of mathematics and computer science
Analysis of advanced       aggregation techniques for software metrics   2/32




     Most metrics do not have a definition at system level.




/   department of mathematics and computer science
Analysis of advanced       aggregation techniques for software metrics   2/32




     Most metrics do not have a definition at system level.




/   department of mathematics and computer science
Analysis of advanced aggregation techniques for software metrics    3/32




             “Designing a sound aggregation of software metrics is not
                obvious and it is still an open issue.” [CSS09]




/   department of mathematics and computer science
Analysis of advanced aggregation techniques for software metrics      3/32




             “Designing a sound aggregation of software metrics is not
                obvious and it is still an open issue.” [CSS09]

             Goal
             Derive requirements for aggregation techniques for software
             metrics.




/   department of mathematics and computer science
Aggregation of software metrics                 4/32




      Many to one:
         Same artifact
              Different metrics
      Example:
      Maintainability Index




/   department of mathematics and computer science
Aggregation of software metrics                 4/32




      Many to one:
         Same artifact
              Different metrics
      Example:
      Maintainability Index

      One to many:
              Same metric
              Different artifacts
      Example:
      Weighted Methods per
      Class

/   department of mathematics and computer science
Approach                                                          5/32




                            Derive requirements for one-to-many
                         aggregation techniques for software metrics

/   department of mathematics and computer science
Approach                                                                      5/32




                                               Study existing
                                           aggregation techniques:
                                        - traditional (e.g., mean, median)
                                        - inequality indices (e.g., Gini, Theil)
                                        - threshold-based (e.g., SIG, Squale)


                             Theoretical                              Empirical
                              analysis                                analysis



                            Derive requirements for one-to-many
                         aggregation techniques for software metrics

/   department of mathematics and computer science
Inequality indices                                                                                                                                6/32




     Econometrics: measure/explain the inequality of income or wealth.

     Software metrics and econometric variables have distributions with
     similar shapes.

                           Source Lines of Code: freecol−0.9.4                                            Household income in Ilocos, Philippines (1998)




                                                                                    100 200 300 400 500
                 400
                 300
     Frequency




                                                                        Frequency
                 200
                 100
                 0




                                                                                    0




                       0     500   1000   1500    2000   2500    3000                                       0     500000         1500000         2500000

                                      SLOC per class                                                                          Income


/   department of mathematics and computer science
Degree of concentration of functionality                                                7/32




      Lorenz curve for SLOC in Hibernate
      3.6.0-beta4.
               1.0
               0.8
               0.6
      % SLOC

               0.4
               0.2
               0.0




                     0.0   0.1   0.2   0.3   0.4      0.5      0.6   0.7   0.8   0.9   1.0

                                                   % Classes




/   department of mathematics and computer science
Degree of concentration of functionality                7/32




      Lorenz curve for SLOC in Hibernate
      3.6.0-beta4.

                               A
                                     2A
                              A+ B =
                   I Gini =



                                              I Hoover

                                          A
                                                         B




/   department of mathematics and computer science
Degree of concentration of functionality                                              7/32




      Lorenz curve for SLOC in Hibernate                     Measure inequality between:
      3.6.0-beta4.                                              individuals
                                                                (e.g., classes)
                               A                                groups
                                     2A
                              A+ B =
                   I Gini =
                                                                (e.g., components)

                                              I Hoover

                                          A
                                                         B




/   department of mathematics and computer science
Degree of concentration of functionality                                           7/32




      When computing the inequality                       Measure inequality between:
      within the entire population, it is                    individuals
      often desirable to assess the                          (e.g., classes)
      contribution of the inequality                         groups
      between the groups.                                    (e.g., components)

      Decomposability:

             I (X ) = I within + I between
                           m
                     =           ωj I (Xj ) + I between
                          j =1




/   department of mathematics and computer science
Traceability via decomposability                                            8/32




     Share of inequality explained by the partitioning G = {G1 , . . . , Gm }:

                                                            I between (G )
                                                 R (G ) =
                                                                 I (X )




/   department of mathematics and computer science
Traceability via decomposability                                            8/32




     Share of inequality explained by the partitioning G = {G1 , . . . , Gm }:

                                                            I between (G )
                                                 R (G ) =
                                                                 I (X )

     Which individuals (classes in package) contribute to 80% of the
     inequality of SLOC?
     Which class contributes the most to the inequality?




/   department of mathematics and computer science
Traceability via decomposability                                                     8/32




     Lemma
     Let X = {x1 , x2 , . . . , xn } be a collection of values such that x1 ≤ xi ≤ xn .
     Then, it is either x1 or xn that contributes the most to the inequality
     measured using ITheil , i.e., it is either the partitioning ({x1 }, X  {x1 }) or
     the partitioning ({xn }, X  {xn }) that provides the best explanation for
     the inequality measured using ITheil .




/   department of mathematics and computer science
Other properties of inequality indices                             9/32




     Symmetry




     Inequality stays the same for any permutation of the population.




/   department of mathematics and computer science
Other properties of inequality indices                             9/32




     Symmetry




     Inequality stays the same for any permutation of the population.




/   department of mathematics and computer science
Other properties of inequality indices                             9/32




     Symmetry




     Inequality stays the same for any permutation of the population.




/   department of mathematics and computer science
Other properties of inequality indices                                 10/32




     Population principle




     Inequality does not change if the population is replicated any number of
     times.



/   department of mathematics and computer science
Other properties of inequality indices                                 10/32




     Population principle




     Inequality does not change if the population is replicated any number of
     times.



/   department of mathematics and computer science
Other properties of inequality indices                                 10/32




     Population principle




     Inequality does not change if the population is replicated any number of
     times.



/   department of mathematics and computer science
Other properties of inequality indices                              11/32




     Transfers principle




     A transfer from a rich man to a poor man (without reversing their
     position) should decrease inequality.




/   department of mathematics and computer science
Other properties of inequality indices                              11/32




     Transfers principle




     A transfer from a rich man to a poor man (without reversing their
     position) should decrease inequality.




/   department of mathematics and computer science
Other properties of inequality indices                              11/32




     Transfers principle




     A transfer from a rich man to a poor man (without reversing their
     position) should decrease inequality.




/   department of mathematics and computer science
Other properties of inequality indices                              11/32




     Transfers principle


                                                     20        36   45




                                                          30   36




     A transfer from a rich man to a poor man (without reversing their
     position) should decrease inequality.




/   department of mathematics and computer science
Other properties of inequality indices                                12/32




     Scale invariance




     Inequality does not change if all values are multiplied by the same
     constant.
/   department of mathematics and computer science
Other properties of inequality indices                                12/32




     Scale invariance




     Inequality does not change if all values are multiplied by the same
     constant.
/   department of mathematics and computer science
Summary                                                                        13/32




                           Ineq. index           Sym.   Inv.   Dec.   Pop.   Tra.
                           IGini                         ×
                           ITheil                        ×
                           IMLD                          ×
                           IHoover                       ×
                            α
                           IAtkinson                     ×
                            β
                           IKolm                         +

     Problems include:
         Domain not always Rn .
             No distinction between all values equal but low, and all values
             equal but high.



/   department of mathematics and computer science
Threshold-based aggregation techniques                                        14/32




     Two types:
             hard thresholds: improvements in quality are not reflected as long
             as the metrics stay within certain boundaries (e.g., SIG).
             soft thresholds: do not exhibit staircasing effects (e.g., Squale).




/   department of mathematics and computer science
The Squale Quality Model                                      15/32




       Metrics




                                  Individual Marks
                                       in [0,3]




                                                     Global Mark
                                                       in [0,3]
/   department of mathematics and computer science
The Squale Quality Model                                                                                                                     15/32




                                                        3.0




                                 Individual Mark (IM)
                                                        2.5

                                                        2.0

                                                        1.5

                                                        1.0

                                                        0.5


       Metrics                                          0.0

                                                              0   10   20   30   40   50   60   70   80   90      110   130   150   170

                                                                                                SLOC per method




                                                   Individual Marks
                                                        in [0,3]




                                                                                                                                    Global Mark
                                                                                                                                      in [0,3]
/   department of mathematics and computer science
The Squale Quality Model                                                                                                                     15/32




                                                        3.0




                                 Individual Mark (IM)
                                                        2.5

                                                        2.0

                                                        1.5

                                                        1.0

                                                        0.5


       Metrics                                          0.0

                                                              0   10   20   30   40   50   60   70   80   90      110   130   150   170

                                                                                                SLOC per method




                                                   Individual Marks
                                                        in [0,3]




                                                                                                                                    Global Mark
                                                                                                                                      in [0,3]
/   department of mathematics and computer science
Properties of Squale aggregation                                    16/32




             Symmetry




             Population princ.
                                                     20        36   45




                                                          30   36




             Anti-transfers princ.



/   department of mathematics and computer science
Properties of Squale aggregation                                                      17/32




     Lemma
      log λ                      λ
     IKolm (x1 , . . . , xn ) + ISquale (x1 , . . . , xn ) = x
                                                             ¯


     Lemma
                                  λ
     For all c ∈ R it holds that ISquale is “unit translatable”, i.e.,

                        λ                                  λ
                       ISquale (x1 + c, . . . , xn + c) = ISquale (x1 , . . . , xn ) + c



     Inequality indices are invariant with respect to either multiplication, or
     addition.



/   department of mathematics and computer science
Summary                                                                   18/32




     We distill:
             Highlighting undesirable values in the aggregated result.

     However, problems include:
             Thresholds should be derived and validated.
             A high rating is not necessarily an indication of good software
             engineering practices.
             Not decomposable.




/   department of mathematics and computer science
Approach                                                                      19/32




                                               Study existing
                                           aggregation techniques:
                                        - traditional (e.g., mean, median)
                                        - inequality indices (e.g., Gini, Theil)
                                        - threshold-based (e.g., SIG, Squale)


                             Theoretical                              Empirical
                              analysis                                analysis



                            Derive requirements for one-to-many
                         aggregation techniques for software metrics

/   department of mathematics and computer science
Empirical evaluation                            20/32




/   department of mathematics and computer science
Pilot study                                                          21/32




     Aggregate SLOC from class to package level.
    Study statistical correlation between
        aggregation techniques and
        number of defects per package.
            pairs of aggregation techniques.
     Case studies: ArgoUML, Adempiere, Mogwai.
     Questions:
             Does aggregation technique influence correlation with bugs?



             Which aggregation techniques convey the same information?


/   department of mathematics and computer science
Pilot study                                                                                     21/32




     Aggregate SLOC from class to package level.
    Study statistical correlation between
        aggregation techniques and
        number of defects per package.
            pairs of aggregation techniques.
     Case studies: ArgoUML, Adempiere, Mogwai.
     Questions:
             Does aggregation technique influence correlation with bugs?
                  •   Correlation between SLOC and defects is not strong, and is
                      influenced by the aggregation technique.
             Which aggregation techniques convey the same information?
                  •   IGini , ITheil , IMLD , IHoover , and IAtkinson convey the same information.

/   department of mathematics and computer science
Threats to validity                             22/32


        Threat              Pilot
        Metric              SLOC
                            ArgoUML
        System              Adempiere
                            Mogwai
        Version             single
        Technique           traditional
                            ineq. indices
        Aggr. level         class–package




/   department of mathematics and computer science
Threats to validity                                                                           22/32


        Threat              Pilot                    Subsequent studies
        Metric              SLOC                     SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC
                            ArgoUML                  Qualitas Corpus
        System              Adempiere                106 Java open-source systems
                            Mogwai                   430K files, 57 MSLOC
        Version             single                   414 from 13/106 systems (> 10 versions)
        Technique           traditional              traditional, ineq. indices, threshold-based
                            ineq. indices
        Aggr. level         class–package            class-package, method–class




/   department of mathematics and computer science
Results (1)                                                                                                           23/32

     IGini , ITheil , IMLD , IAtkinson , and IHoover always convey the same information.
               1.0
               0.5
        SLOC

               0.0
               -0.5
               -1.0




                       (91%)     (89%)     (91%)      (90%)    (92%)      (92%)    (90%)      (91%)    (91%)      (92%)

                      MLD-Hoo   Gin-MLD   The-MLD    Gin-Hoo   Atk-Hoo   The-Hoo   Gin-Atk   MLD-Atk   Gin-The   The-Atk
               1.0
               0.5
        DIT

               0.0
               -0.5
               -1.0




                       (85%)     (87%)     (87%)      (88%)    (88%)      (89%)    (88%)      (88%)    (88%)      (89%)

                      MLD-Hoo   Atk-Hoo   Gin-MLD    The-Hoo   Gin-Atk   Gin-Hoo   Gin-The   The-MLD   The-Atk   MLD-Atk



/   department of mathematics and computer science
Results (2)                                                                                                                                                                                                                            24/32



     IKolm shows high correlation with mean for size metrics.

                                               Kendall corr.: mean - Kolm (SLOC)                                            Kendall corr.: mean - Kolm (DIT)                                            Kendall corr.: mean - Kolm (PLwC)
                                        1.0




                                                                                                                     1.0




                                                                                                                                                                                                 1.0
                                        0.5




                                                                                                                     0.5




                                                                                                                                                                                                 0.5
      Kendall correlation coefficient




                                                                                   Kendall correlation coefficient




                                                                                                                                                               Kendall correlation coefficient
                                        0.0




                                                                                                                     0.0




                                                                                                                                                                                                 0.0
                                        -0.5




                                                                                                                     -0.5




                                                                                                                                                                                                 -0.5
                                        -1.0




                                                                                                                     -1.0




                                                                                                                                                                                                 -1.0



/   department of mathematics and computer science
Results (3)                                                                                                                                                 25/32



     Superlinear (e.g., ITheil –IGini ) and chaotic (e.g., ITheil –IKolm ) patterns can
     be observed in the scatter plots.

                                compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00                            compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01
                    1.0




                                                                                                  1.0
                    0.8




                                                                                                  0.8
     Theil (SLOC)




                                                                                   Theil (SLOC)
                    0.6




                                                                                                  0.6
                    0.4




                                                                                                  0.4
                    0.2




                                                                                                  0.2
                    0.0




                                                                                                  0.0
                          0.1       0.2      0.3       0.4       0.5     0.6                            0       50     100    150    200     250    300    350

                                                   Gini (SLOC)                                                                Kolm (SLOC)




/   department of mathematics and computer science
Results (4)                                                                                                                                                                                                                                      26/32



     Changing the aggregation level to class level does not affect the
     correlation between various aggregation techniques as measured at
     package level.

                                               Kendall: Gini - Theil (SLOC) (100%)                                            Kendall: Theil - Atkinson (SLOC) (100%)                                            Kendall: Theil - MLD (SLOC) (100%)
                                        1.0




                                                                                                                       1.0




                                                                                                                                                                                                          1.0
                                        0.5




                                                                                                                       0.5




                                                                                                                                                                                                          0.5
      Kendall correlation coefficient




                                                                                     Kendall correlation coefficient




                                                                                                                                                                        Kendall correlation coefficient
                                        0.0




                                                                                                                       0.0




                                                                                                                                                                                                          0.0
                                        -0.5




                                                                                                                       -0.5




                                                                                                                                                                                                          -0.5
                                        -1.0




                                                                                                                       -1.0




                                                                                                                                                                                                          -1.0



/   department of mathematics and computer science
/
                                                                   Cor. coeff. Theil(SLOC) − Kolm(SLOC)

                                                                   0.0   0.2    0.4    0.6    0.8    1.0



                                                           0.8.1
                                                             1.0
                                                             1.1
                                                   2.0−beta−1
                                                   2.0−beta−2
                                                   2.0−beta−3
                                                   2.0−beta−4
                                                       2.0−final
                                                        2.0−rc2
                                                           2.0.1
                                                                                                                                                                                                                                            Results (5)



                                                           2.0.2
                                                           2.0.3
                                                   2.1−beta−1
                                                   2.1−beta−2
                                                   2.1−beta−3
                                                  2.1−beta−3b
                                                   2.1−beta−4
                                                   2.1−beta−5
                                                   2.1−beta−6
                                                       2.1−final
                                                        2.1−rc1
                                                           2.1.1
                                                           2.1.2
                                                           2.1.3
                                                           2.1.4
                                                           2.1.5
                                                           2.1.6
                                                           2.1.7




department of mathematics and computer science
                                                           2.1.8
                                                             3.0
                                                     3.0−alpha
                                                     3.0−beta1
                                                     3.0−beta2
                                                     3.0−beta3
                                                     3.0−beta4
                                                        3.0−rc1
                                                           3.0.1
                                                           3.0.2
                                                           3.0.3
                                                           3.0.4
                                                           3.0.5
                                                             3.1
                                                    3.1−alpha1
                                                     3.1−beta1
                                                     3.1−beta2
                                                     3.1−beta3
                                                        3.1−rc1
                                                        3.1−rc2
                                                        3.1−rc3
                                                           3.1.1
                                                           3.1.2
                                                           3.1.3
                                                    3.2−alpha1
                                                    3.2−alpha2
                                                        3.2−cr1
                                                        3.2−cr2
                                                      3.2.0−cr3
                                                      3.2.0−cr4
                                                      3.2.0−cr5
                                                        3.2.0.ga
                                                       3.2.1−ga
                                                       3.2.2−ga
                                                       3.2.3−ga
                                                                                                           hibernate − Kendall(Theil(SLOC), Kolm(SLOC)) (86 releases)




                                                       3.2.4−ga
                                                     3.2.4−sp1
                                                       3.2.5−ga
                                                       3.2.6−ga
                                                                                                                                                                        techniques, e.g., ITheil –IKolm increases with system size.




                                                       3.2.7−ga
                                                      3.3.0−cr2
                                                       3.3.0−ga
                                                     3.3.0−sp1
                                                       3.3.0.cr1
                                                       3.3.1−ga
                                                       3.3.2−ga
                                                 3.5.0−beta−1
                                                 3.5.0−beta−2
                                                 3.5.0−beta−3
                                                 3.5.0−beta−4
                                                    3.5.0−cr−1
                                                                                                                                                                        System size does influence the correlation between aggregation




                                                    3.5.0−cr−2
                                                     3.5.3−final
                                                     3.5.5−final
                                                   3.6.0−beta1
                                                   3.6.0−beta2
                                                   3.6.0−beta3
                                                   3.6.0−beta4
                                                                                                                                                                                                                                        27/32
Results (6)                                                                                                                                                                                                                                                  28/32



     SIG and Squale correlate positively to each other and negatively to all
     other aggregation techniques.

                                                    Kendall: Squale(3) - SIGd (SLOC) (95%)                                            Kendall: Gini - Squale(3) (SLOC) (95%)                                            Kendall: Theil - Squale(3) (SLOC) (95%)
                                             1.0




                                                                                                                               1.0




                                                                                                                                                                                                                 1.0
                                             0.5




                                                                                                                               0.5




                                                                                                                                                                                                                 0.5
           Kendall correlation coefficient




                                                                                             Kendall correlation coefficient




                                                                                                                                                                               Kendall correlation coefficient
                                             0.0




                                                                                                                               0.0




                                                                                                                                                                                                                 0.0
                                             -0.5




                                                                                                                               -0.5




                                                                                                                                                                                                                 -0.5
                                             -1.0




                                                                                                                               -1.0




                                                                                                                                                                                                                 -1.0




/   department of mathematics and computer science
Results (7)                                                                                                                                                                                                                                                                                           29/32

     Inequality indices are less appropriate for highlighting undesirable
     values unless assumptions about their number can be made.
                                              Squale (weight = 3) aggregate for different percentages of perfect IMs                                                                                                     Theil aggregate for different percentages of perfect IMs
                                        3.0                                                                                                          3.0                                                  0.0                                                                                       3.0
     Average Squale (weight = 3) mark




                                        2.5                                                                                                          2.5                                                                                                                                            2.5
                                                                                                                                                                                                          0.5




                                                                                                                                                                                Average Theil aggregate
                                                                                                                                                           Average mean range




                                                                                                                                                                                                                                                                                                          Average mean range
                                        2.0                                                                                                          2.0                                                                                                                                            2.0

                                                                                                                                                                                                          1.0
                                        1.5                                                                                                          1.5                                                                                                                                            1.5


                                        1.0                                                                                                          1.0                                                  1.5                                                                                       1.0
                                                                                                                                                                                                                    range [2, 3)
                                                    range [2, 3)                                                                                                                                                    range [1, 2)
                                        0.5                                                                                                          0.5                                                            range [0.5, 1)                                                                  0.5
                                                    range [1, 2)
                                                                                                                                                                                                          2.0       range [0.1, 0.5)
                                                    range [0.5, 1)
                                                    range [0.1, 0.5)                                                                                                                                                range (0, 0.1)
                                        0.0         range (0, 0.1)                                                                                   0.0                                                                                                                                            0.0

                                                0          10          20   30     40      50      60      70                     80    90    100                                                               0          10          20     30     40      50      60      70     80   90   100
                                                                              Percentage of imperfect marks                                                                                                                                     Percentage of imperfect marks


                                                                                                                                Kolm aggregate for different percentages of perfect IMs
                                                                                                                      0.0                                                                                                                                3.0


                                                                                                                                                                                                                                                         2.5
                                                                                                                      0.2
                                                                                             Average Kolm aggregate




                                                                                                                                                                                                                                                               Average mean range
                                                                                                                                                                                                                                                         2.0
                                                                                                                      0.4

                                                                                                                                                                                                                                                         1.5

                                                                                                                      0.6
                                                                                                                                                                                                                                                         1.0
                                                                                                                                                                                                                                range [2, 3)
                                                                                                                      0.8                                                                                                       range [1, 2)
                                                                                                                                                                                                                                range [0.5, 1)           0.5
                                                                                                                                                                                                                                range [0.1, 0.5)
                                                                                                                                                                                                                                range (0, 0.1)
                                                                                                                      1.0                                                                                                                                0.0

                                                                                                                            0   10     20    30     40      50      60      70                                               80         90         100
/   department of mathematics and computer science                                                                                             Percentage of imperfect marks
Summary                                                                         30/32




     We distill:
             Correlation with Squale or SIG for aggregation techniques that
             satisfy the highlight problems requirement.
             Correlation with ITheil , IMLD , or IAtkinson , e.g., for aggregation
             techniques that satisfy the symmetry and decomposability
             requirements.




/   department of mathematics and computer science
Conclusions                                                                                            31/32



                                          Existing aggregation techniques

                                                                    Empirical analysis
                        Theoretical analysis
                                                                     - methodology and tooling
      - root-cause analysis using                                    - correlation studies with different
      - mathematical properties of                                   objectives, metrics, systems,
                                                                     versions, aggregation techniques,
                                                                     aggregation levels

                                          Requirements for one-to-many
                                    aggregation techniques for software metrics




/   department of mathematics and computer science
Conclusions                                                                                             31/32



                                          Existing aggregation techniques

                                                                    Empirical analysis
                        Theoretical analysis
                                                                     - methodology and tooling
      - root-cause analysis using                                    - correlation studies with different
      - mathematical properties of                                   objectives, metrics, systems,
                                                                     versions, aggregation techniques,
                                                                     aggregation levels

                                          Requirements for one-to-many
                                    aggregation techniques for software metrics
                                                                                            Social organization
     Determine an optimal partitioning                                                      of software projects
                                                                                  Extensions:
                                                                            - other software metrics
                                                                            - non-software domains
         Apply the same techniques to
      aggregation of combined metrics data                New one-to-many aggregation
                                                         techniques for software metrics




/   department of mathematics and computer science
Publications                                                                                                                                                                                                                                                                                                                                                                                                           32/32



                                                                                                                                                                                                                             You Can’t Control the Unfamiliar:
                           Comparative Study of Software Metrics’ Aggregation Techniques
                                                                                                                                                                                                                       A Study on the Relations Between Aggregation
                                                                                                                                                                                                                              Techniques for Software Metrics
                                           Bogdan Vasilescu, Alexander Serebrenik∗, Mark van den Brand
                                                               Technische Universiteit Eindhoven,                                                                                                                                     Bogdan Vasilescu, Alexander Serebrenik, Mark van den Brand
                                                Den Dolech 2, P.O. Box 513, 5600 MB Eindhoven, The Netherlands
                                                                                                                                                                                                                                                       Technische Universiteit Eindhoven,
                                                                                                                                                                                                                                                         Den Dolech 2, P Box 513,
                                                                                                                                                                                                                                                                           .O.
                                                                                                                                                                                                                                                      5600 MB Eindhoven, The Netherlands
                                                                                                                                                                                                                                         {b.n.vasilescu@student., a.serebrenik@, m.g.j.v.d.brand@}tue.nl

               Abstr act
               While software metrics are commonly used to assess software maintainability and study software evolution, they are                                                                        Abstract—A popular approach to assessing software main-                 However, metrics are usually defined at micro level (method,
               usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order to provide                                                                  tainability and predicting its evolution involves collecting and         class, package), while the analysis of maintainability and
                                                                           By No Means: A Study on Aggregating Software Metrics
               insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the                                                                  analyzing software metr ics. However, metr ics are usually defined
                                                                                                                                                                                                     on a micro-level (method, class, package), and should therefore
                                                                                                                                                                                                                                                                              evolution requires insights at macro (system) level. Moreover,
                                                                                                                                                                                                                                                                                                           JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE
               mean, recently econometric aggregation techniques such as the Gini index and the Theil index have been proposed.                                                                      be aggregated in or der to provide insights in the evolution at the      due to privacy reasons, it J. Softw. Maint. Evol.: Res. to disclose00:1–15
                                                                                                                                                                                                                                                                                                            might be undesirable Pract. 0000;
                                                                                                                                                                                                                                                                                                           Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/smr
               Advantages and disadvantages of di erent aggregation techniques have not been evaluated empirically so far. In this                                                                   macro-level (system). I n addition to tr aditional aggregation tech-     metrics pertaining to a single developer as opposed to those
               paper we present the preliminary results of the comparative study of di erent aggregation techniques.Alexander Serebrenik
                                                                                  Bogdan Vasilescu                                                                 Mark van den Brand                niques such as the mean, median, or sum, recently econometr ic           pertaining to the entire project [10]. Metrics should therefore
                                                                                 Technische Universiteit             Technische Universiteit                        Technische Universiteit          aggregation techniques, such as the Gini, Theil, Kolm, Atkinson,         be aggregated [11].
               Keywords:                                                                                                                                                                             and Hoover inequality indices have been proposed and applied
                                                                                        Eindhoven                           Eindhoven                                      Eindhoven                                                                                             Popular aggregation techniques include such standard sum-
               software metrics, maintainability, aggregation techniques      Den Dolech 2, P.O. Box 513,         Den Dolech 2, P.O. Box 513,                   Den Dolech 2, P.O. Box 513,          to software metr ics.
                                                                                  5600 MB Eindhoven                   5600 MB Eindhoven                              5600 MB Eindhoven                   I n this paper we present the results of an extensive cor relation                                                Practical Software Quality Metrics Aggregation
                                                                                                                                                                                                                                                                              mary statistical measures as mean, median, or sum [12], [13].
                                                                                                                                                                                                     study of the most widely-used tr aditional and econometr ic aggre-       Their main advantage is universality (metrics-independence):
                                                                                     The Netherlands                     The Netherlands                                The Netherlands              gation techniques, applied to lifting SL OC values from class to         whatever metrics are considered, the measures should be cal-
                                                                                    b.n.vasilescu@student.tue.nl a.serebrenik@tue.nl                             m.g.j.v.d.brand@tue.nl              package level in the 106 systems compr ising the Qualitas Cor pus.       culated in the same way. However, as the distribution of many
               1. I ntroduction                                                                                                                                                                      M oreover, we investigate the nature of this relation, and study                                         Karine Mordal 1 , Nicolas Anquetil 2 , Jannik Laval 2 , Alexander Serebrenik3 , Bogdan
                                                                                                                                                                                                                                                                              interesting software metrics is skewed [14], the interpretation
                                                                         ABSTRACT                                                            (source) lines of code, (S)LOC. Size (SLOC) not onlyits evolution on a subset of 12 systems from the Qualitas Cor pus.
                                                                                                                                                                                                      corre-                                                                  of such measures becomes unreliable [15].
                                                                                                                                                                                                                                                                                                                                               Vasilescu3 , and St´ phane Ducasse2
                                                                                                                                                                                                                                                                                                                                                                     e
                     While software metrics are commonly used to assess software maintainability and study software evolution, they sponds to the intuitive belief that large systems have more results indicate high and statistically significant cor re-
                                                                                                                                                                                                         Our
                                                                         Fault prediction models usually employ software metrics which
               are usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order to faults in them than small systems, but was shown lation between the Gini, Theil, Atkinson, and Hoover indices,                                   Alternatively, distribution fitting [14], [16], [17] consists of1
                                                                                                                                                                                                      to act                                                                                                                                        LIASD, University of Paris 8, France
                                                                         were previously shown to be a strong predictor for defects,
               provide insights in the evolution at the macro-level (system). Popular aggregation techniques include themicro- [15] as an early indicator of problems better than, e.g., object-     i.e., aggregation values obtained using these techniques convey          selecting a known family of distributions (e.g., log-normal 2 RMoD Team, INRIA, Lille, France
                                                                         e.g., SLOC. However, metrics are usually de ned on a mean                                                                   the same infor mation. However, we discuss some of the r ationale        or exponential) and fitting its parameters to approximate the Universiteit Eindhoven, The Netherlands
                                                                                                                                                                                                                                                                                                                                           3 Technische
               and distribution fitting [4, 19]. The main advantage of the mean is its metrics-independence: whatever metrics are oriented metrics such as the Chidamber and Kemerer suite choosing between one index or another.
                                                                         level (method, class, package), and should therefore be ag-                                                                 behind
               considered, the mean should be calculated in the same way. However, as the distribution of manyevolution atsoftware or the Lorenz and Kidd suite [9].                                                                                                          metric values observed. The fitted parameters can be then
                                                                         gregated in order to provide insights in the interesting the
                                                                          Distribution fitting consists of selecting a known family of distri- However, software metrics are commonly de ned at micro-
               metrics is skewed [24] the mean becomes unreliable. macro-level (system). In addition to traditional aggrega-                                                                                                                                                  seen as aggregating these values. However, the fitting process
                                                                                                                                                                                                                             I . I NTRODUCTI ON
                                                                                                                                             level (method, class, package), and should therefore be ag-
               butions (e.g., log-normal, exponential or negativebinomial) and fitting its parameters to approximate the metric values gregated at macro-level (system), in order to provide insights
                                                                         tion techniques such as the mean, median, or sum, recently                                                                                                                                           should be repeated whenever a new metric is being consid-
                                                                                                                                                                                                         Software maintenance is an area of software engineering              ered. Moreover, it is still a matter of controversy whether,
               observed. However, the fitting process should be repeated whenever a new metric is beingsuch as the Gini, Theil, it is in the study of maintainability and evolution.
                                                                         econometric aggregation techniques, considered. Moreover,                                                                                                                                                                                                                            SUMMARY
                                                                         and Hoover indices have been proposed. In this paper we                                                                     with deep financial implications. Indeed, it was reported that            e.g., software size is distributed log-normally [16] or double
               still a matter of controversy whether, e.g., software size is distributed log-normally [4] or double Pareto [11].                Popular aggregation techniques include such standard sum-
                                                                         wish to understand whether the aggregation technique in-
                     It is highly desirable, hence, to develop an aggregation approach that would be bothof the relation between of mary statistical measures as mean, median, or sum [19].
                                                                                                                 reliable and independent
                                                                                                                                                                                                     between 60% and 90% of the software budgets represent main-              Pareto [18]. We do not consider the growing fitting. quality assessment of entire software systems, in practice, new issues are
                                                                                                                                                                                                                                                                                                           With distribution need for
                                                                           uences the presence and strength
               the metrics being aggregated. Examples of such approaches are the Gini coe cientindicate that correlation is[22], Their main advantage is universality (metrics-independence):
                                                                                                                                                                                                     tenance and evolution costs [1]–[3]. Furthermore, maintenance               Recently, there is an emerging trendFirst, since most software quality metrics are defined at the level of individual software
                                                                                                                                                                                                                                                                                                           emerging. in using more advanced
                                                                         SLOC and defects. Our results [10] and the Theil index                                                                                                                                                                            components, there is a need for aggregation methods to summarize the results at the system level. Second,
                                                                                                                                             whatever metrics are considered, the measures should be and evolution costs were forecasted to account for more than             aggregation techniques borrowed practical evaluation requires the use of different metrics, with possibly widely varying output ranges,
                                                                                                                                                                                                                                                                                                           since a from econometrics, where
               both well-known in econometrics [6] and recently not strong, software metrics [23, 20]. Comparison of di erent
                                                                          applied to and is in uenced by the aggregation technique.
                                                                                                                                             calculated in the same way. However, as the distribution of North American and European software budgets in
                                                                                                                                                                                                     half of                                                                  they are used to study inequality of a need to combine distribu-
                                                                                                                                                                                                                                                                                                           there is income or welfare these results into a unified quality assessment. Third, since projects vary and
               aggregation techniques was so far missing, however. In this short paper we present the first preliminary results.
                                                                                                                                             many interesting software metrics is skewed [29], the2010 [4]. Similar or even higher figures were reported for
                                                                                                                                                                                                       inter-                                                                 tions [19]–[21]. The motivation for organizations have different perceptions on quality, there is a need to adapt the interpretation of the
                                                                                                                                                                                                                                                                                                           different applying such techniques
                                                                         Categor ies and Subj ect Descr iptor s
                     Remainder of thispaper isorganized asfollows. In Section 2 webriefly introducetheaggregation techniquesbeing pretation of such measures becomes unreliable.
                                                                                                                                                                                                     countries such as Norway [5] and Chile [6].                                                           quality assessment to the perception of
                                                                                                                                                                                                                                                                              to software metrics is twofold. First, as numerous countries the users performing it. In this paper we identify the requirements for
               compared. Section 3 compares the theoretical properties of di erent aggregation techniques. Section 4 described the Alternatively, distribution tting [6, 26, 29] consists of se-
                                                                         D.2.7 [Software Engineering]: Distribution, Maintenance,                                                                                                                                                                          a practical aggregation method, and present the Squale model for metric aggregation, specifically designed
               empirical studies conducted and, finally, Section 5 discusses related work and concludes. [Software Engineer-                                                                              Controlling software maintenance costs requires predicting           have few rich and many poor, numerous software systems
                                                                         and Enhancement corrections; D.2.8                                  lecting a known family of distributions (e.g., log-normal or                                                                                                  to address the needs of practitioners. We empirically validate the adequation of Squale through experiments
                                                                                                                                             exponential) and tting its parameters to approximate the
                                                                                                                                                                                                     how the system will evolve in the future, which in turn                  have few very big or complex Eclipse. Additionally, wesmall or the Squale model to both traditional aggregation techniques (e.g., the
                                                                                                                                                                                                                                                                                                           on components, and many compare
                                                                         ing]: Metrics complexity measures
                                                                                                                                             metric values observed. The tted parameters can be then a better understanding of software evolution [7]–[9].
                                                                                                                                                                                                     requires                                                                 simple ones [15], [22], [23]. Consequently, it is commoneconometric inequality indices (e.g., the Gini or the Theil indices), recently
                                                                                                                                                                                                                                                                                                           arithmetic mean), as well as to both
               2. Aggregation techniques                                                                                                     considered as aggregating these values. However, the A ttingpopular approach to assessing software maintainability and           for software metrics, as well as for econometric variables metrics. Copyright c 0000 John Wiley & Sons, Ltd.
                                                                                                                                                                                                                                                                                                           applied to aggregation of software to
                                                                         Gener al Ter ms                                                     process should be repeated whenever a new metric predicting its evolution involves performing measurements on
                                                                                                                                                                                                      is be-                                                                  have strongly-skewed distributions (Figure 1).
                     In this section we briefly present the mathematical definitions of the aggregation techniques to be evaluated. Let ing considered. Moreover, it is still a matter of controversy
                                                                         Measurement, Economics, Experimentation                                                                                     code artifacts. It starts off by identifying a number of specific            Second, the shape of these distributions, which appear
                                                                                                                                                                                                                                                                                                           Received . . .
               {x1, . . . , xn} be the set of values to be aggregated. Then, the mean, denoted as x, is defined as 1 n xi .
                                                                                                      ¯              n    i=1                whether, e.g., software size is distributed log-normally [6] or
                                                                                                                                                                                                     properties of the system under investigation, and then collect-          visually to follow a power law, renders the use of traditional
                                                                         Keywor ds                                                           double Pareto [14].                                     ing the corresponding software metrics and analyzing their                                            KEY WORDS: software metrics; software quality; aggregation; inequality indices
                                                                                                                                                                                                                                                                              aggregation techniques such as the sample mean and variance
                                                                                                                                                Recently, there is an emerging trend in using more ad-
                   ∗ Corresponding author                                Software metrics, maintainability, aggregation techniques                                                                   evolution. Although it is debatable whether one cannot control
                                                                                                                                             vanced aggregation techniques, that are both reliable, as well
                                                                                                                                                                                                                                                                              questionable at best. Indeed, it was reported that many impor-
                     Email addresses: b.n.vasilescu@student.tue.nl (Bogdan Vasilescu), a.serebrenik@tue.nl (Alexander Serebrenik),                                                                   what one cannot measure, it is without a doubt that collecting
                                                                                                                                             as general. Examples of such approaches are the Gini coe -                                                                       tant relationships between software artifacts follow a power-
               m.g.j.v.d.brand@tue.nl (Mark van den Brand)               1. I NTRODUCTI ON                                                                                                           and analyzing metrics helps increase one’s familiarity and
                                                                                                                                             cient [11], the Theil index [28], and the Hoover index [15], all                                                                 law distribution [16], [25], and it is known that a power-law
                                                                            Software maintenance is an area of software engineering          well-known in econometrics for their applicability to understanding of the analyzed systems.
                                                                                                                                                                                                     study-                                                                   distribution may not have a finite mean and variance [22].                 1. INTRODUCTION
                                                                         with deep nancial implications. Indeed, it was reported             ing income inequality [7], and recently applied to software
               Preprint submitted to Elsevier                            that up to 90% of the software budgets represent mainte- 2011 metrics [27, 30, 13, 31].
                                                                                                                                   June 27,                                                                                                                                                                Software metrics are becoming part of the software development fabric, essential to understanding
                                                                         nance and evolution costs [10, 3]. Thus, in order to control           In this preliminary study, based on the assumption that                                                                                                    whether the quality of the software we are building corresponds to our expectations [Pfl08]. As
                                                                                                                                             size is a good predictor for defects, hence size and defects
                                                                         software maintenance costs, it is desirable, e.g., to predict                                                                                                                                                                     a consequence, many different metrics have been proposed, as well as a plethora of tools to
                                                                         faulty components early in the development phase.                   should be statistically related, we wish to understand whether
                                                                                                                                             the aggregation technique in uences the presence and strength                                                                                                 computethem and perform quality assessments. Considering thedifferent stakeholdersparticipating
                                                                            Fault prediction models usually employ software metrics
                                                                         which were previously shown to be a strong predictor for de-        of this relation. Brie y, our results indicate that correlation                                                                                               in software projects (e.g. developers, managers, users), quality needs to be evaluated at different
                                                                         fects [9, 4, 21, 22, 20, 12]. Such a metric is size, measured in    between SLOC and defects is not strong, and is in uenced                                                                                                      levels of detail. Practical application of software metrics is, however, challenged by (i) the need
                                                                                                                                             by the aggregation technique.                                                                                                                                 to combine different metrics as recommended by quality-model design methods such as Factor-
                                                                                                                                                                                                                                                                                                           Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]; (ii) the need to obtain
                                                                                                                                                           2. M ETHODOL OGY                                                                                                                                insights in quality of the entire system based on the metric values obtained for low-level system
                                                                       Permission to make digital or hard copies of all or part of this work for                                                                                                                                                           elements such as classes and methods; and (iii) the need to fine tune the quality model to different
                                                                       personal or classroom use is granted without fee provided that copies are             We apply correlation analysis to SLOC data of Java classes
                                                                       not made or distributed for profit or commercial advantage and that copies           aggregated at package level using di erent aggregation tech-                                                                                    quality standards employed by different organizations. We detail each challenge separately.
                                                                       bear this notice and the full citation on the first page. To copy otherwise, to      niques, and defects (bug count per package). As a by-                                                                                              First, a practical quality assessment needs to combine the results of various methods to answer
                                                                       republish, to post on servers or to redistribute to lists, requires prior specific   product of our evaluation, we also study the correlation be-                                                                                    specific questionsassuggested by such modelsasFactor-Criteria-Metric (FCM) [MRW76], or Goal-
                                                                       permission and/or a fee.
                                                                       ICSE ’ 11, May 21–28, 2011, Waikiki, Honolulu, HI, USA                              tween the di erent aggregation techniques themselves. The                                                                                       Question-Metric (GQM) [Bas92]. For example, cyclomatic complexity might be combined with test
                                                                       Copyright 2011 ACM 978-1-4503-0593-8/11/05 ...$10.00.                               choice for aggregating data from class to package level rather

                                                                                                                                                                                                                                                                                                            Correspondence to: INRIA Team RMod, Parc Scientifique de la Haute Borne, 40, avenue Halley. Bt.A, Park Plaza,
                                                                                                                                                                                                                                                                                                           59650 Villeneuve d’ Ascq, France. E-mail: Nicolas.Anquetil@inria.fr

                                                                                                                                                                                                                                                                                                           Copyright c 0000 John Wiley & Sons, Ltd.
                                                                                                                                                                                                                                                                                                           Prepared using smrauth.cls [ Version: 2010/05/10 v2.00]




              BeNeVol 2010                                                                                                        WETSoM 2011                                                                                            ICSM 2011                                                                                                              JSME


/   department of mathematics and computer science
Master Thesis presentation
Master Thesis presentation

More Related Content

What's hot

Thesis Power Point Presentation
Thesis Power Point PresentationThesis Power Point Presentation
Thesis Power Point Presentationriddhikapandya1985
 
Dissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeDissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeCorey Caugherty
 
Prepare your Ph.D. Defense Presentation
Prepare your Ph.D. Defense PresentationPrepare your Ph.D. Defense Presentation
Prepare your Ph.D. Defense PresentationChristian Glahn
 
Thesis Powerpoint
Thesis PowerpointThesis Powerpoint
Thesis Powerpointneha47
 
Proposal defense presentation
Proposal defense presentationProposal defense presentation
Proposal defense presentationRuchika Mehresh
 
Li Yingying-- Master Thesis Defense
Li Yingying-- Master Thesis DefenseLi Yingying-- Master Thesis Defense
Li Yingying-- Master Thesis DefenseYingying Li
 
Thesis Defense Presentation
Thesis Defense PresentationThesis Defense Presentation
Thesis Defense Presentationosideloc
 
Research proposal presentation
Research proposal presentationResearch proposal presentation
Research proposal presentationEita Ahmad
 
PhD Proposal Presentation
PhD Proposal PresentationPhD Proposal Presentation
PhD Proposal PresentationUlrich Eck
 
Dissertation Defense Presentation
Dissertation Defense PresentationDissertation Defense Presentation
Dissertation Defense PresentationDr. Timothy Kelly
 
Dissertation Proposal Defense
Dissertation Proposal DefenseDissertation Proposal Defense
Dissertation Proposal DefenseShajaira Lopez
 
Msc Proposal Presentation
Msc Proposal PresentationMsc Proposal Presentation
Msc Proposal PresentationLighton Phiri
 
Ashbaugh dissertation defense presentation
Ashbaugh dissertation defense presentationAshbaugh dissertation defense presentation
Ashbaugh dissertation defense presentationDRMLAID
 
Mc Kirkley Oral Defense 11122009 V2
Mc Kirkley Oral Defense 11122009 V2Mc Kirkley Oral Defense 11122009 V2
Mc Kirkley Oral Defense 11122009 V2mckirkley
 
My Dissertation Proposal Defense
My Dissertation Proposal DefenseMy Dissertation Proposal Defense
My Dissertation Proposal DefenseLaura Pasquini
 
Dissertation oral defense presentation
Dissertation   oral defense presentationDissertation   oral defense presentation
Dissertation oral defense presentationDr. Naomi Mangatu
 
Master Thesis, Preliminary Defense
Master Thesis, Preliminary DefenseMaster Thesis, Preliminary Defense
Master Thesis, Preliminary DefenseJenkins Macedo
 

What's hot (20)

Thesis Power Point Presentation
Thesis Power Point PresentationThesis Power Point Presentation
Thesis Power Point Presentation
 
BEHAILU DEFENCE PPT (final 4)
BEHAILU DEFENCE PPT (final 4)BEHAILU DEFENCE PPT (final 4)
BEHAILU DEFENCE PPT (final 4)
 
Dissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeDissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitative
 
Prepare your Ph.D. Defense Presentation
Prepare your Ph.D. Defense PresentationPrepare your Ph.D. Defense Presentation
Prepare your Ph.D. Defense Presentation
 
Thesis Powerpoint
Thesis PowerpointThesis Powerpoint
Thesis Powerpoint
 
Proposal defense presentation
Proposal defense presentationProposal defense presentation
Proposal defense presentation
 
Li Yingying-- Master Thesis Defense
Li Yingying-- Master Thesis DefenseLi Yingying-- Master Thesis Defense
Li Yingying-- Master Thesis Defense
 
PhD Viva PPT
PhD Viva PPTPhD Viva PPT
PhD Viva PPT
 
Msc Thesis - Presentation
Msc Thesis - PresentationMsc Thesis - Presentation
Msc Thesis - Presentation
 
Thesis Defense Presentation
Thesis Defense PresentationThesis Defense Presentation
Thesis Defense Presentation
 
Research proposal presentation
Research proposal presentationResearch proposal presentation
Research proposal presentation
 
PhD Proposal Presentation
PhD Proposal PresentationPhD Proposal Presentation
PhD Proposal Presentation
 
Dissertation Defense Presentation
Dissertation Defense PresentationDissertation Defense Presentation
Dissertation Defense Presentation
 
Dissertation Proposal Defense
Dissertation Proposal DefenseDissertation Proposal Defense
Dissertation Proposal Defense
 
Msc Proposal Presentation
Msc Proposal PresentationMsc Proposal Presentation
Msc Proposal Presentation
 
Ashbaugh dissertation defense presentation
Ashbaugh dissertation defense presentationAshbaugh dissertation defense presentation
Ashbaugh dissertation defense presentation
 
Mc Kirkley Oral Defense 11122009 V2
Mc Kirkley Oral Defense 11122009 V2Mc Kirkley Oral Defense 11122009 V2
Mc Kirkley Oral Defense 11122009 V2
 
My Dissertation Proposal Defense
My Dissertation Proposal DefenseMy Dissertation Proposal Defense
My Dissertation Proposal Defense
 
Dissertation oral defense presentation
Dissertation   oral defense presentationDissertation   oral defense presentation
Dissertation oral defense presentation
 
Master Thesis, Preliminary Defense
Master Thesis, Preliminary DefenseMaster Thesis, Preliminary Defense
Master Thesis, Preliminary Defense
 

Similar to Master Thesis presentation

Seeing the forest for the trees, UMons 2011
Seeing the forest for the trees, UMons 2011Seeing the forest for the trees, UMons 2011
Seeing the forest for the trees, UMons 2011Bogdan Vasilescu
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicatorsvie_dels
 
Interior Dual Optimization Software Engineering with Applications in BCS Elec...
Interior Dual Optimization Software Engineering with Applications in BCS Elec...Interior Dual Optimization Software Engineering with Applications in BCS Elec...
Interior Dual Optimization Software Engineering with Applications in BCS Elec...BRNSS Publication Hub
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Yueshen Xu
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...Mariana de Azevedo Santos
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...IRJET Journal
 
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...IRJET Journal
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
 
Software Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsSoftware Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsProgrameter
 
Detection of Attentiveness from Periocular Information
Detection of Attentiveness from Periocular InformationDetection of Attentiveness from Periocular Information
Detection of Attentiveness from Periocular InformationIRJET Journal
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsIsrael Herraiz
 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of CubesUniversity of Bologna
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...IJCI JOURNAL
 

Similar to Master Thesis presentation (20)

Seeing the forest for the trees, UMons 2011
Seeing the forest for the trees, UMons 2011Seeing the forest for the trees, UMons 2011
Seeing the forest for the trees, UMons 2011
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicators
 
Interior Dual Optimization Software Engineering with Applications in BCS Elec...
Interior Dual Optimization Software Engineering with Applications in BCS Elec...Interior Dual Optimization Software Engineering with Applications in BCS Elec...
Interior Dual Optimization Software Engineering with Applications in BCS Elec...
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...Similar Characteristics of Internal Software Quality Attributes for Object-Or...
Similar Characteristics of Internal Software Quality Attributes for Object-Or...
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
 
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
Efficient Feature Selection for Fault Diagnosis of Aerospace System Using Syn...
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
Software Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsSoftware Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and Metrics
 
Detection of Attentiveness from Periocular Information
Detection of Attentiveness from Periocular InformationDetection of Attentiveness from Periocular Information
Detection of Attentiveness from Periocular Information
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasets
 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
 

More from Bogdan Vasilescu (10)

ICSM 2012 ERA
ICSM 2012 ERAICSM 2012 ERA
ICSM 2012 ERA
 
Benevol 2012
Benevol 2012Benevol 2012
Benevol 2012
 
SOS-Evol 2012
SOS-Evol 2012SOS-Evol 2012
SOS-Evol 2012
 
IPA Spring Days 2012
IPA Spring Days 2012IPA Spring Days 2012
IPA Spring Days 2012
 
ACM GIS 2011
ACM GIS 2011ACM GIS 2011
ACM GIS 2011
 
ICSM 2011
ICSM 2011ICSM 2011
ICSM 2011
 
Benevol 2011
Benevol 2011Benevol 2011
Benevol 2011
 
Sattose 2011
Sattose 2011Sattose 2011
Sattose 2011
 
Benevol 2010
Benevol 2010Benevol 2010
Benevol 2010
 
WETSoM 2011
WETSoM 2011WETSoM 2011
WETSoM 2011
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Master Thesis presentation

  • 1. Analysis of Advanced Aggregation Techniques for Software Metrics Final presentation Bogdan Vasilescu b.n.vasilescu@student.tue.nl Supervisor: Dr. Alexander Serebrenik July 20, 2011
  • 2. Analysis of advanced aggregation techniques for software metrics 2/32 Most metrics do not have a definition at system level. / department of mathematics and computer science
  • 3. Analysis of advanced aggregation techniques for software metrics 2/32 Most metrics do not have a definition at system level. / department of mathematics and computer science
  • 4. Analysis of advanced aggregation techniques for software metrics 2/32 Most metrics do not have a definition at system level. / department of mathematics and computer science
  • 5. Analysis of advanced aggregation techniques for software metrics 2/32 Most metrics do not have a definition at system level. / department of mathematics and computer science
  • 6. Analysis of advanced aggregation techniques for software metrics 3/32 “Designing a sound aggregation of software metrics is not obvious and it is still an open issue.” [CSS09] / department of mathematics and computer science
  • 7. Analysis of advanced aggregation techniques for software metrics 3/32 “Designing a sound aggregation of software metrics is not obvious and it is still an open issue.” [CSS09] Goal Derive requirements for aggregation techniques for software metrics. / department of mathematics and computer science
  • 8. Aggregation of software metrics 4/32 Many to one: Same artifact Different metrics Example: Maintainability Index / department of mathematics and computer science
  • 9. Aggregation of software metrics 4/32 Many to one: Same artifact Different metrics Example: Maintainability Index One to many: Same metric Different artifacts Example: Weighted Methods per Class / department of mathematics and computer science
  • 10. Approach 5/32 Derive requirements for one-to-many aggregation techniques for software metrics / department of mathematics and computer science
  • 11. Approach 5/32 Study existing aggregation techniques: - traditional (e.g., mean, median) - inequality indices (e.g., Gini, Theil) - threshold-based (e.g., SIG, Squale) Theoretical Empirical analysis analysis Derive requirements for one-to-many aggregation techniques for software metrics / department of mathematics and computer science
  • 12. Inequality indices 6/32 Econometrics: measure/explain the inequality of income or wealth. Software metrics and econometric variables have distributions with similar shapes. Source Lines of Code: freecol−0.9.4 Household income in Ilocos, Philippines (1998) 100 200 300 400 500 400 300 Frequency Frequency 200 100 0 0 0 500 1000 1500 2000 2500 3000 0 500000 1500000 2500000 SLOC per class Income / department of mathematics and computer science
  • 13. Degree of concentration of functionality 7/32 Lorenz curve for SLOC in Hibernate 3.6.0-beta4. 1.0 0.8 0.6 % SLOC 0.4 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 % Classes / department of mathematics and computer science
  • 14. Degree of concentration of functionality 7/32 Lorenz curve for SLOC in Hibernate 3.6.0-beta4. A 2A A+ B = I Gini = I Hoover A B / department of mathematics and computer science
  • 15. Degree of concentration of functionality 7/32 Lorenz curve for SLOC in Hibernate Measure inequality between: 3.6.0-beta4. individuals (e.g., classes) A groups 2A A+ B = I Gini = (e.g., components) I Hoover A B / department of mathematics and computer science
  • 16. Degree of concentration of functionality 7/32 When computing the inequality Measure inequality between: within the entire population, it is individuals often desirable to assess the (e.g., classes) contribution of the inequality groups between the groups. (e.g., components) Decomposability: I (X ) = I within + I between m = ωj I (Xj ) + I between j =1 / department of mathematics and computer science
  • 17. Traceability via decomposability 8/32 Share of inequality explained by the partitioning G = {G1 , . . . , Gm }: I between (G ) R (G ) = I (X ) / department of mathematics and computer science
  • 18. Traceability via decomposability 8/32 Share of inequality explained by the partitioning G = {G1 , . . . , Gm }: I between (G ) R (G ) = I (X ) Which individuals (classes in package) contribute to 80% of the inequality of SLOC? Which class contributes the most to the inequality? / department of mathematics and computer science
  • 19. Traceability via decomposability 8/32 Lemma Let X = {x1 , x2 , . . . , xn } be a collection of values such that x1 ≤ xi ≤ xn . Then, it is either x1 or xn that contributes the most to the inequality measured using ITheil , i.e., it is either the partitioning ({x1 }, X {x1 }) or the partitioning ({xn }, X {xn }) that provides the best explanation for the inequality measured using ITheil . / department of mathematics and computer science
  • 20. Other properties of inequality indices 9/32 Symmetry Inequality stays the same for any permutation of the population. / department of mathematics and computer science
  • 21. Other properties of inequality indices 9/32 Symmetry Inequality stays the same for any permutation of the population. / department of mathematics and computer science
  • 22. Other properties of inequality indices 9/32 Symmetry Inequality stays the same for any permutation of the population. / department of mathematics and computer science
  • 23. Other properties of inequality indices 10/32 Population principle Inequality does not change if the population is replicated any number of times. / department of mathematics and computer science
  • 24. Other properties of inequality indices 10/32 Population principle Inequality does not change if the population is replicated any number of times. / department of mathematics and computer science
  • 25. Other properties of inequality indices 10/32 Population principle Inequality does not change if the population is replicated any number of times. / department of mathematics and computer science
  • 26. Other properties of inequality indices 11/32 Transfers principle A transfer from a rich man to a poor man (without reversing their position) should decrease inequality. / department of mathematics and computer science
  • 27. Other properties of inequality indices 11/32 Transfers principle A transfer from a rich man to a poor man (without reversing their position) should decrease inequality. / department of mathematics and computer science
  • 28. Other properties of inequality indices 11/32 Transfers principle A transfer from a rich man to a poor man (without reversing their position) should decrease inequality. / department of mathematics and computer science
  • 29. Other properties of inequality indices 11/32 Transfers principle 20 36 45 30 36 A transfer from a rich man to a poor man (without reversing their position) should decrease inequality. / department of mathematics and computer science
  • 30. Other properties of inequality indices 12/32 Scale invariance Inequality does not change if all values are multiplied by the same constant. / department of mathematics and computer science
  • 31. Other properties of inequality indices 12/32 Scale invariance Inequality does not change if all values are multiplied by the same constant. / department of mathematics and computer science
  • 32. Summary 13/32 Ineq. index Sym. Inv. Dec. Pop. Tra. IGini × ITheil × IMLD × IHoover × α IAtkinson × β IKolm + Problems include: Domain not always Rn . No distinction between all values equal but low, and all values equal but high. / department of mathematics and computer science
  • 33. Threshold-based aggregation techniques 14/32 Two types: hard thresholds: improvements in quality are not reflected as long as the metrics stay within certain boundaries (e.g., SIG). soft thresholds: do not exhibit staircasing effects (e.g., Squale). / department of mathematics and computer science
  • 34. The Squale Quality Model 15/32 Metrics Individual Marks in [0,3] Global Mark in [0,3] / department of mathematics and computer science
  • 35. The Squale Quality Model 15/32 3.0 Individual Mark (IM) 2.5 2.0 1.5 1.0 0.5 Metrics 0.0 0 10 20 30 40 50 60 70 80 90 110 130 150 170 SLOC per method Individual Marks in [0,3] Global Mark in [0,3] / department of mathematics and computer science
  • 36. The Squale Quality Model 15/32 3.0 Individual Mark (IM) 2.5 2.0 1.5 1.0 0.5 Metrics 0.0 0 10 20 30 40 50 60 70 80 90 110 130 150 170 SLOC per method Individual Marks in [0,3] Global Mark in [0,3] / department of mathematics and computer science
  • 37. Properties of Squale aggregation 16/32 Symmetry Population princ. 20 36 45 30 36 Anti-transfers princ. / department of mathematics and computer science
  • 38. Properties of Squale aggregation 17/32 Lemma log λ λ IKolm (x1 , . . . , xn ) + ISquale (x1 , . . . , xn ) = x ¯ Lemma λ For all c ∈ R it holds that ISquale is “unit translatable”, i.e., λ λ ISquale (x1 + c, . . . , xn + c) = ISquale (x1 , . . . , xn ) + c Inequality indices are invariant with respect to either multiplication, or addition. / department of mathematics and computer science
  • 39. Summary 18/32 We distill: Highlighting undesirable values in the aggregated result. However, problems include: Thresholds should be derived and validated. A high rating is not necessarily an indication of good software engineering practices. Not decomposable. / department of mathematics and computer science
  • 40. Approach 19/32 Study existing aggregation techniques: - traditional (e.g., mean, median) - inequality indices (e.g., Gini, Theil) - threshold-based (e.g., SIG, Squale) Theoretical Empirical analysis analysis Derive requirements for one-to-many aggregation techniques for software metrics / department of mathematics and computer science
  • 41. Empirical evaluation 20/32 / department of mathematics and computer science
  • 42. Pilot study 21/32 Aggregate SLOC from class to package level. Study statistical correlation between aggregation techniques and number of defects per package. pairs of aggregation techniques. Case studies: ArgoUML, Adempiere, Mogwai. Questions: Does aggregation technique influence correlation with bugs? Which aggregation techniques convey the same information? / department of mathematics and computer science
  • 43. Pilot study 21/32 Aggregate SLOC from class to package level. Study statistical correlation between aggregation techniques and number of defects per package. pairs of aggregation techniques. Case studies: ArgoUML, Adempiere, Mogwai. Questions: Does aggregation technique influence correlation with bugs? • Correlation between SLOC and defects is not strong, and is influenced by the aggregation technique. Which aggregation techniques convey the same information? • IGini , ITheil , IMLD , IHoover , and IAtkinson convey the same information. / department of mathematics and computer science
  • 44. Threats to validity 22/32 Threat Pilot Metric SLOC ArgoUML System Adempiere Mogwai Version single Technique traditional ineq. indices Aggr. level class–package / department of mathematics and computer science
  • 45. Threats to validity 22/32 Threat Pilot Subsequent studies Metric SLOC SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC ArgoUML Qualitas Corpus System Adempiere 106 Java open-source systems Mogwai 430K files, 57 MSLOC Version single 414 from 13/106 systems (> 10 versions) Technique traditional traditional, ineq. indices, threshold-based ineq. indices Aggr. level class–package class-package, method–class / department of mathematics and computer science
  • 46. Results (1) 23/32 IGini , ITheil , IMLD , IAtkinson , and IHoover always convey the same information. 1.0 0.5 SLOC 0.0 -0.5 -1.0 (91%) (89%) (91%) (90%) (92%) (92%) (90%) (91%) (91%) (92%) MLD-Hoo Gin-MLD The-MLD Gin-Hoo Atk-Hoo The-Hoo Gin-Atk MLD-Atk Gin-The The-Atk 1.0 0.5 DIT 0.0 -0.5 -1.0 (85%) (87%) (87%) (88%) (88%) (89%) (88%) (88%) (88%) (89%) MLD-Hoo Atk-Hoo Gin-MLD The-Hoo Gin-Atk Gin-Hoo Gin-The The-MLD The-Atk MLD-Atk / department of mathematics and computer science
  • 47. Results (2) 24/32 IKolm shows high correlation with mean for size metrics. Kendall corr.: mean - Kolm (SLOC) Kendall corr.: mean - Kolm (DIT) Kendall corr.: mean - Kolm (PLwC) 1.0 1.0 1.0 0.5 0.5 0.5 Kendall correlation coefficient Kendall correlation coefficient Kendall correlation coefficient 0.0 0.0 0.0 -0.5 -0.5 -0.5 -1.0 -1.0 -1.0 / department of mathematics and computer science
  • 48. Results (3) 25/32 Superlinear (e.g., ITheil –IGini ) and chaotic (e.g., ITheil –IKolm ) patterns can be observed in the scatter plots. compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00 compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01 1.0 1.0 0.8 0.8 Theil (SLOC) Theil (SLOC) 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 50 100 150 200 250 300 350 Gini (SLOC) Kolm (SLOC) / department of mathematics and computer science
  • 49. Results (4) 26/32 Changing the aggregation level to class level does not affect the correlation between various aggregation techniques as measured at package level. Kendall: Gini - Theil (SLOC) (100%) Kendall: Theil - Atkinson (SLOC) (100%) Kendall: Theil - MLD (SLOC) (100%) 1.0 1.0 1.0 0.5 0.5 0.5 Kendall correlation coefficient Kendall correlation coefficient Kendall correlation coefficient 0.0 0.0 0.0 -0.5 -0.5 -0.5 -1.0 -1.0 -1.0 / department of mathematics and computer science
  • 50. / Cor. coeff. Theil(SLOC) − Kolm(SLOC) 0.0 0.2 0.4 0.6 0.8 1.0 0.8.1 1.0 1.1 2.0−beta−1 2.0−beta−2 2.0−beta−3 2.0−beta−4 2.0−final 2.0−rc2 2.0.1 Results (5) 2.0.2 2.0.3 2.1−beta−1 2.1−beta−2 2.1−beta−3 2.1−beta−3b 2.1−beta−4 2.1−beta−5 2.1−beta−6 2.1−final 2.1−rc1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7 department of mathematics and computer science 2.1.8 3.0 3.0−alpha 3.0−beta1 3.0−beta2 3.0−beta3 3.0−beta4 3.0−rc1 3.0.1 3.0.2 3.0.3 3.0.4 3.0.5 3.1 3.1−alpha1 3.1−beta1 3.1−beta2 3.1−beta3 3.1−rc1 3.1−rc2 3.1−rc3 3.1.1 3.1.2 3.1.3 3.2−alpha1 3.2−alpha2 3.2−cr1 3.2−cr2 3.2.0−cr3 3.2.0−cr4 3.2.0−cr5 3.2.0.ga 3.2.1−ga 3.2.2−ga 3.2.3−ga hibernate − Kendall(Theil(SLOC), Kolm(SLOC)) (86 releases) 3.2.4−ga 3.2.4−sp1 3.2.5−ga 3.2.6−ga techniques, e.g., ITheil –IKolm increases with system size. 3.2.7−ga 3.3.0−cr2 3.3.0−ga 3.3.0−sp1 3.3.0.cr1 3.3.1−ga 3.3.2−ga 3.5.0−beta−1 3.5.0−beta−2 3.5.0−beta−3 3.5.0−beta−4 3.5.0−cr−1 System size does influence the correlation between aggregation 3.5.0−cr−2 3.5.3−final 3.5.5−final 3.6.0−beta1 3.6.0−beta2 3.6.0−beta3 3.6.0−beta4 27/32
  • 51. Results (6) 28/32 SIG and Squale correlate positively to each other and negatively to all other aggregation techniques. Kendall: Squale(3) - SIGd (SLOC) (95%) Kendall: Gini - Squale(3) (SLOC) (95%) Kendall: Theil - Squale(3) (SLOC) (95%) 1.0 1.0 1.0 0.5 0.5 0.5 Kendall correlation coefficient Kendall correlation coefficient Kendall correlation coefficient 0.0 0.0 0.0 -0.5 -0.5 -0.5 -1.0 -1.0 -1.0 / department of mathematics and computer science
  • 52. Results (7) 29/32 Inequality indices are less appropriate for highlighting undesirable values unless assumptions about their number can be made. Squale (weight = 3) aggregate for different percentages of perfect IMs Theil aggregate for different percentages of perfect IMs 3.0 3.0 0.0 3.0 Average Squale (weight = 3) mark 2.5 2.5 2.5 0.5 Average Theil aggregate Average mean range Average mean range 2.0 2.0 2.0 1.0 1.5 1.5 1.5 1.0 1.0 1.5 1.0 range [2, 3) range [2, 3) range [1, 2) 0.5 0.5 range [0.5, 1) 0.5 range [1, 2) 2.0 range [0.1, 0.5) range [0.5, 1) range [0.1, 0.5) range (0, 0.1) 0.0 range (0, 0.1) 0.0 0.0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Percentage of imperfect marks Percentage of imperfect marks Kolm aggregate for different percentages of perfect IMs 0.0 3.0 2.5 0.2 Average Kolm aggregate Average mean range 2.0 0.4 1.5 0.6 1.0 range [2, 3) 0.8 range [1, 2) range [0.5, 1) 0.5 range [0.1, 0.5) range (0, 0.1) 1.0 0.0 0 10 20 30 40 50 60 70 80 90 100 / department of mathematics and computer science Percentage of imperfect marks
  • 53. Summary 30/32 We distill: Correlation with Squale or SIG for aggregation techniques that satisfy the highlight problems requirement. Correlation with ITheil , IMLD , or IAtkinson , e.g., for aggregation techniques that satisfy the symmetry and decomposability requirements. / department of mathematics and computer science
  • 54. Conclusions 31/32 Existing aggregation techniques Empirical analysis Theoretical analysis - methodology and tooling - root-cause analysis using - correlation studies with different - mathematical properties of objectives, metrics, systems, versions, aggregation techniques, aggregation levels Requirements for one-to-many aggregation techniques for software metrics / department of mathematics and computer science
  • 55. Conclusions 31/32 Existing aggregation techniques Empirical analysis Theoretical analysis - methodology and tooling - root-cause analysis using - correlation studies with different - mathematical properties of objectives, metrics, systems, versions, aggregation techniques, aggregation levels Requirements for one-to-many aggregation techniques for software metrics Social organization Determine an optimal partitioning of software projects Extensions: - other software metrics - non-software domains Apply the same techniques to aggregation of combined metrics data New one-to-many aggregation techniques for software metrics / department of mathematics and computer science
  • 56. Publications 32/32 You Can’t Control the Unfamiliar: Comparative Study of Software Metrics’ Aggregation Techniques A Study on the Relations Between Aggregation Techniques for Software Metrics Bogdan Vasilescu, Alexander Serebrenik∗, Mark van den Brand Technische Universiteit Eindhoven, Bogdan Vasilescu, Alexander Serebrenik, Mark van den Brand Den Dolech 2, P.O. Box 513, 5600 MB Eindhoven, The Netherlands Technische Universiteit Eindhoven, Den Dolech 2, P Box 513, .O. 5600 MB Eindhoven, The Netherlands {b.n.vasilescu@student., a.serebrenik@, m.g.j.v.d.brand@}tue.nl Abstr act While software metrics are commonly used to assess software maintainability and study software evolution, they are Abstract—A popular approach to assessing software main- However, metrics are usually defined at micro level (method, usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order to provide tainability and predicting its evolution involves collecting and class, package), while the analysis of maintainability and By No Means: A Study on Aggregating Software Metrics insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the analyzing software metr ics. However, metr ics are usually defined on a micro-level (method, class, package), and should therefore evolution requires insights at macro (system) level. Moreover, JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE mean, recently econometric aggregation techniques such as the Gini index and the Theil index have been proposed. be aggregated in or der to provide insights in the evolution at the due to privacy reasons, it J. Softw. Maint. Evol.: Res. to disclose00:1–15 might be undesirable Pract. 0000; Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/smr Advantages and disadvantages of di erent aggregation techniques have not been evaluated empirically so far. In this macro-level (system). I n addition to tr aditional aggregation tech- metrics pertaining to a single developer as opposed to those paper we present the preliminary results of the comparative study of di erent aggregation techniques.Alexander Serebrenik Bogdan Vasilescu Mark van den Brand niques such as the mean, median, or sum, recently econometr ic pertaining to the entire project [10]. Metrics should therefore Technische Universiteit Technische Universiteit Technische Universiteit aggregation techniques, such as the Gini, Theil, Kolm, Atkinson, be aggregated [11]. Keywords: and Hoover inequality indices have been proposed and applied Eindhoven Eindhoven Eindhoven Popular aggregation techniques include such standard sum- software metrics, maintainability, aggregation techniques Den Dolech 2, P.O. Box 513, Den Dolech 2, P.O. Box 513, Den Dolech 2, P.O. Box 513, to software metr ics. 5600 MB Eindhoven 5600 MB Eindhoven 5600 MB Eindhoven I n this paper we present the results of an extensive cor relation Practical Software Quality Metrics Aggregation mary statistical measures as mean, median, or sum [12], [13]. study of the most widely-used tr aditional and econometr ic aggre- Their main advantage is universality (metrics-independence): The Netherlands The Netherlands The Netherlands gation techniques, applied to lifting SL OC values from class to whatever metrics are considered, the measures should be cal- b.n.vasilescu@student.tue.nl a.serebrenik@tue.nl m.g.j.v.d.brand@tue.nl package level in the 106 systems compr ising the Qualitas Cor pus. culated in the same way. However, as the distribution of many 1. I ntroduction M oreover, we investigate the nature of this relation, and study Karine Mordal 1 , Nicolas Anquetil 2 , Jannik Laval 2 , Alexander Serebrenik3 , Bogdan interesting software metrics is skewed [14], the interpretation ABSTRACT (source) lines of code, (S)LOC. Size (SLOC) not onlyits evolution on a subset of 12 systems from the Qualitas Cor pus. corre- of such measures becomes unreliable [15]. Vasilescu3 , and St´ phane Ducasse2 e While software metrics are commonly used to assess software maintainability and study software evolution, they sponds to the intuitive belief that large systems have more results indicate high and statistically significant cor re- Our Fault prediction models usually employ software metrics which are usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order to faults in them than small systems, but was shown lation between the Gini, Theil, Atkinson, and Hoover indices, Alternatively, distribution fitting [14], [16], [17] consists of1 to act LIASD, University of Paris 8, France were previously shown to be a strong predictor for defects, provide insights in the evolution at the macro-level (system). Popular aggregation techniques include themicro- [15] as an early indicator of problems better than, e.g., object- i.e., aggregation values obtained using these techniques convey selecting a known family of distributions (e.g., log-normal 2 RMoD Team, INRIA, Lille, France e.g., SLOC. However, metrics are usually de ned on a mean the same infor mation. However, we discuss some of the r ationale or exponential) and fitting its parameters to approximate the Universiteit Eindhoven, The Netherlands 3 Technische and distribution fitting [4, 19]. The main advantage of the mean is its metrics-independence: whatever metrics are oriented metrics such as the Chidamber and Kemerer suite choosing between one index or another. level (method, class, package), and should therefore be ag- behind considered, the mean should be calculated in the same way. However, as the distribution of manyevolution atsoftware or the Lorenz and Kidd suite [9]. metric values observed. The fitted parameters can be then gregated in order to provide insights in the interesting the Distribution fitting consists of selecting a known family of distri- However, software metrics are commonly de ned at micro- metrics is skewed [24] the mean becomes unreliable. macro-level (system). In addition to traditional aggrega- seen as aggregating these values. However, the fitting process I . I NTRODUCTI ON level (method, class, package), and should therefore be ag- butions (e.g., log-normal, exponential or negativebinomial) and fitting its parameters to approximate the metric values gregated at macro-level (system), in order to provide insights tion techniques such as the mean, median, or sum, recently should be repeated whenever a new metric is being consid- Software maintenance is an area of software engineering ered. Moreover, it is still a matter of controversy whether, observed. However, the fitting process should be repeated whenever a new metric is beingsuch as the Gini, Theil, it is in the study of maintainability and evolution. econometric aggregation techniques, considered. Moreover, SUMMARY and Hoover indices have been proposed. In this paper we with deep financial implications. Indeed, it was reported that e.g., software size is distributed log-normally [16] or double still a matter of controversy whether, e.g., software size is distributed log-normally [4] or double Pareto [11]. Popular aggregation techniques include such standard sum- wish to understand whether the aggregation technique in- It is highly desirable, hence, to develop an aggregation approach that would be bothof the relation between of mary statistical measures as mean, median, or sum [19]. reliable and independent between 60% and 90% of the software budgets represent main- Pareto [18]. We do not consider the growing fitting. quality assessment of entire software systems, in practice, new issues are With distribution need for uences the presence and strength the metrics being aggregated. Examples of such approaches are the Gini coe cientindicate that correlation is[22], Their main advantage is universality (metrics-independence): tenance and evolution costs [1]–[3]. Furthermore, maintenance Recently, there is an emerging trendFirst, since most software quality metrics are defined at the level of individual software emerging. in using more advanced SLOC and defects. Our results [10] and the Theil index components, there is a need for aggregation methods to summarize the results at the system level. Second, whatever metrics are considered, the measures should be and evolution costs were forecasted to account for more than aggregation techniques borrowed practical evaluation requires the use of different metrics, with possibly widely varying output ranges, since a from econometrics, where both well-known in econometrics [6] and recently not strong, software metrics [23, 20]. Comparison of di erent applied to and is in uenced by the aggregation technique. calculated in the same way. However, as the distribution of North American and European software budgets in half of they are used to study inequality of a need to combine distribu- there is income or welfare these results into a unified quality assessment. Third, since projects vary and aggregation techniques was so far missing, however. In this short paper we present the first preliminary results. many interesting software metrics is skewed [29], the2010 [4]. Similar or even higher figures were reported for inter- tions [19]–[21]. The motivation for organizations have different perceptions on quality, there is a need to adapt the interpretation of the different applying such techniques Categor ies and Subj ect Descr iptor s Remainder of thispaper isorganized asfollows. In Section 2 webriefly introducetheaggregation techniquesbeing pretation of such measures becomes unreliable. countries such as Norway [5] and Chile [6]. quality assessment to the perception of to software metrics is twofold. First, as numerous countries the users performing it. In this paper we identify the requirements for compared. Section 3 compares the theoretical properties of di erent aggregation techniques. Section 4 described the Alternatively, distribution tting [6, 26, 29] consists of se- D.2.7 [Software Engineering]: Distribution, Maintenance, a practical aggregation method, and present the Squale model for metric aggregation, specifically designed empirical studies conducted and, finally, Section 5 discusses related work and concludes. [Software Engineer- Controlling software maintenance costs requires predicting have few rich and many poor, numerous software systems and Enhancement corrections; D.2.8 lecting a known family of distributions (e.g., log-normal or to address the needs of practitioners. We empirically validate the adequation of Squale through experiments exponential) and tting its parameters to approximate the how the system will evolve in the future, which in turn have few very big or complex Eclipse. Additionally, wesmall or the Squale model to both traditional aggregation techniques (e.g., the on components, and many compare ing]: Metrics complexity measures metric values observed. The tted parameters can be then a better understanding of software evolution [7]–[9]. requires simple ones [15], [22], [23]. Consequently, it is commoneconometric inequality indices (e.g., the Gini or the Theil indices), recently arithmetic mean), as well as to both 2. Aggregation techniques considered as aggregating these values. However, the A ttingpopular approach to assessing software maintainability and for software metrics, as well as for econometric variables metrics. Copyright c 0000 John Wiley & Sons, Ltd. applied to aggregation of software to Gener al Ter ms process should be repeated whenever a new metric predicting its evolution involves performing measurements on is be- have strongly-skewed distributions (Figure 1). In this section we briefly present the mathematical definitions of the aggregation techniques to be evaluated. Let ing considered. Moreover, it is still a matter of controversy Measurement, Economics, Experimentation code artifacts. It starts off by identifying a number of specific Second, the shape of these distributions, which appear Received . . . {x1, . . . , xn} be the set of values to be aggregated. Then, the mean, denoted as x, is defined as 1 n xi . ¯ n i=1 whether, e.g., software size is distributed log-normally [6] or properties of the system under investigation, and then collect- visually to follow a power law, renders the use of traditional Keywor ds double Pareto [14]. ing the corresponding software metrics and analyzing their KEY WORDS: software metrics; software quality; aggregation; inequality indices aggregation techniques such as the sample mean and variance Recently, there is an emerging trend in using more ad- ∗ Corresponding author Software metrics, maintainability, aggregation techniques evolution. Although it is debatable whether one cannot control vanced aggregation techniques, that are both reliable, as well questionable at best. Indeed, it was reported that many impor- Email addresses: b.n.vasilescu@student.tue.nl (Bogdan Vasilescu), a.serebrenik@tue.nl (Alexander Serebrenik), what one cannot measure, it is without a doubt that collecting as general. Examples of such approaches are the Gini coe - tant relationships between software artifacts follow a power- m.g.j.v.d.brand@tue.nl (Mark van den Brand) 1. I NTRODUCTI ON and analyzing metrics helps increase one’s familiarity and cient [11], the Theil index [28], and the Hoover index [15], all law distribution [16], [25], and it is known that a power-law Software maintenance is an area of software engineering well-known in econometrics for their applicability to understanding of the analyzed systems. study- distribution may not have a finite mean and variance [22]. 1. INTRODUCTION with deep nancial implications. Indeed, it was reported ing income inequality [7], and recently applied to software Preprint submitted to Elsevier that up to 90% of the software budgets represent mainte- 2011 metrics [27, 30, 13, 31]. June 27, Software metrics are becoming part of the software development fabric, essential to understanding nance and evolution costs [10, 3]. Thus, in order to control In this preliminary study, based on the assumption that whether the quality of the software we are building corresponds to our expectations [Pfl08]. As size is a good predictor for defects, hence size and defects software maintenance costs, it is desirable, e.g., to predict a consequence, many different metrics have been proposed, as well as a plethora of tools to faulty components early in the development phase. should be statistically related, we wish to understand whether the aggregation technique in uences the presence and strength computethem and perform quality assessments. Considering thedifferent stakeholdersparticipating Fault prediction models usually employ software metrics which were previously shown to be a strong predictor for de- of this relation. Brie y, our results indicate that correlation in software projects (e.g. developers, managers, users), quality needs to be evaluated at different fects [9, 4, 21, 22, 20, 12]. Such a metric is size, measured in between SLOC and defects is not strong, and is in uenced levels of detail. Practical application of software metrics is, however, challenged by (i) the need by the aggregation technique. to combine different metrics as recommended by quality-model design methods such as Factor- Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]; (ii) the need to obtain 2. M ETHODOL OGY insights in quality of the entire system based on the metric values obtained for low-level system Permission to make digital or hard copies of all or part of this work for elements such as classes and methods; and (iii) the need to fine tune the quality model to different personal or classroom use is granted without fee provided that copies are We apply correlation analysis to SLOC data of Java classes not made or distributed for profit or commercial advantage and that copies aggregated at package level using di erent aggregation tech- quality standards employed by different organizations. We detail each challenge separately. bear this notice and the full citation on the first page. To copy otherwise, to niques, and defects (bug count per package). As a by- First, a practical quality assessment needs to combine the results of various methods to answer republish, to post on servers or to redistribute to lists, requires prior specific product of our evaluation, we also study the correlation be- specific questionsassuggested by such modelsasFactor-Criteria-Metric (FCM) [MRW76], or Goal- permission and/or a fee. ICSE ’ 11, May 21–28, 2011, Waikiki, Honolulu, HI, USA tween the di erent aggregation techniques themselves. The Question-Metric (GQM) [Bas92]. For example, cyclomatic complexity might be combined with test Copyright 2011 ACM 978-1-4503-0593-8/11/05 ...$10.00. choice for aggregating data from class to package level rather Correspondence to: INRIA Team RMod, Parc Scientifique de la Haute Borne, 40, avenue Halley. Bt.A, Park Plaza, 59650 Villeneuve d’ Ascq, France. E-mail: Nicolas.Anquetil@inria.fr Copyright c 0000 John Wiley & Sons, Ltd. Prepared using smrauth.cls [ Version: 2010/05/10 v2.00] BeNeVol 2010 WETSoM 2011 ICSM 2011 JSME / department of mathematics and computer science