SlideShare une entreprise Scribd logo
1  sur  16
Metrics are usually computed at a low level:
           classes, methods, …




/ W&I / MDSE        3-11-2012 PAGE 0
Multitude of data values obscures a general
      picture of the system maintainability




/W&I / MDSE         3-11-2012 PAGE 1
That we are actually interested in!




/W&I / MDSE          3-11-2012 PAGE 2
You Can't Control the Unfamiliar:
A Study on the Relations
Between Aggregation
Techniques for Software Metrics

 Bogdan Vasilescu
 Alexander Serebrenik
 Mark van den Brand
Two kinds of aggregation
Same metrics, different                Same artifact, different
artifacts                              metrics




/W&I / MDSE         3-11-2012 PAGE 4
Various techniques can be
   found in the literature
Same metrics, different                  Traditional: mean,
artifacts                                median, sum, …



                                       Econometric
                                       inequality indices:
                                       Gini, Theil, Hoover,
                                       Kolm, Atkinson




/W&I / MDSE         3-11-2012 PAGE 5
Various techniques can be
   found in the literature
Same metrics, different                  Traditional: mean,
artifacts                                median, sum, …

                                            Which
                                       aggregation
                                       Econometric
                                         technique
                                       inequality indices:
                                       Gini, Theil, Hoover,
                                         should we
                                       Kolm, Atkinson
                                              use?


/W&I / MDSE         3-11-2012 PAGE 6
Questions

      1. Which and to what extent do the different
         aggregation techniques agree?

      2. What is the nature of the relation between the
         various aggregation techniques?

      3. How does the correlation coefficient change as the
         systems evolve?




/W&I / MDSE            3-11-2012 PAGE 7
Qualitas Corpus 20101126
     • Qualitas Corpus 20101126r, 106 systems
     • FitJava v1.1, 2 packages, 2240 SLOC
     • NetBeans v6.9.1, 3373 packages 1890536 SLOC.




/W&I / MDSE          3-11-2012 PAGE 8
1) Agreement between diff techniques

      • Agreement:
          • Aggregation: Class SLOC  Package
          • Techniques agree if they rank the packages similarly




        We use rank-based correlation coefficient: Kendall’s 


/W&I / MDSE               3-11-2012 PAGE 9
1) Agreement: different inequality indices?
     • Gini, Theil, Hoover, Atkinson – agree
         • aggregates obtained convey the same information
         • Kolm does not!




/W&I / MDSE              3-11-2012 PAGE 10
1) Agreement: traditional and ineq indices?

    • mean
        • Kolm: strong (0,8) and statistically significant (92%)
        • median, standard deviation, and variance


    • sum
        • does not correlate with any other aggregation technique




/W&I / MDSE               3-11-2012 PAGE 11
2) Nature of the relation: Typical patterns




   • Theil is known to be more           • Linear relation with a “fat”
     sensitive to the rich                 head
   • Theil increases faster
     when Gini increases
/W&I / MDSE          3-11-2012 PAGE 12
Which aggregation technique? (1)

      • Theil, Hoover, Gini and Atkinson agree
          • Any can be chosen from the correlation point of view


      • Some might be “better” in each specific case
          • easy to interpret: Gini  [0,1]
          • provide additional insights: Theil (explanation)
          • negative values: Gini, Hoover
               − affects the domain!
          • sensitive for high values: Theil, Atkinson
          • deviations from uniformity: Gini, Hoover




/ W&I / MDSE                  3-11-2012 PAGE 13
Which aggregation technique? (2)

      • Kolm and mean agree
          • Kolm is reliable for skewed distributions
            − better alternative (“by no means”)
          • Not in the paper:
            − agreement observed for NOC
            − but not for DIT!




/ W&I / MDSE               3-11-2012 PAGE 14
Conclusions




/W&I / MDSE         3-11-2012 PAGE 15

Contenu connexe

En vedette

Metropolia - Projektityön esitys
Metropolia - Projektityön esitysMetropolia - Projektityön esitys
Metropolia - Projektityön esitysAtte Järvelä
 
Benevol 2013: Visualizing the complexity of software module upgrades
Benevol 2013: Visualizing the complexity of software module upgradesBenevol 2013: Visualizing the complexity of software module upgrades
Benevol 2013: Visualizing the complexity of software module upgradesAlexander Serebrenik
 
865 Project presentation
865 Project presentation865 Project presentation
865 Project presentationIan Pollock
 
Mo E Training 00 Welcome
Mo E Training   00   WelcomeMo E Training   00   Welcome
Mo E Training 00 Welcomeabumaather
 
English Flip Chart 2010
English Flip Chart 2010English Flip Chart 2010
English Flip Chart 2010AMuniz
 
System7 Five Point
System7 Five PointSystem7 Five Point
System7 Five PointLisa Bell
 
Plan van aanpak Documentaire
Plan van aanpak DocumentairePlan van aanpak Documentaire
Plan van aanpak Documentairepepijnborgwat.nl
 
Invited Talk MESOCA 2014: Evolving software systems: emerging trends and chal...
Invited Talk MESOCA 2014: Evolving software systems: emerging trends and chal...Invited Talk MESOCA 2014: Evolving software systems: emerging trends and chal...
Invited Talk MESOCA 2014: Evolving software systems: emerging trends and chal...Alexander Serebrenik
 
Flowgen: Flowchart-Based Documentation Framework for C++
Flowgen: Flowchart-Based Documentation Framework for C++Flowgen: Flowchart-Based Documentation Framework for C++
Flowgen: Flowchart-Based Documentation Framework for C++Alexander Serebrenik
 
865 social capital
865 social capital865 social capital
865 social capitalIan Pollock
 
Arts & Crafts Expo
Arts & Crafts ExpoArts & Crafts Expo
Arts & Crafts ExpoColt
 
Fresh Produce
Fresh ProduceFresh Produce
Fresh ProduceColt
 

En vedette (19)

Metropolia - Projektityön esitys
Metropolia - Projektityön esitysMetropolia - Projektityön esitys
Metropolia - Projektityön esitys
 
Benevol 2013: Visualizing the complexity of software module upgrades
Benevol 2013: Visualizing the complexity of software module upgradesBenevol 2013: Visualizing the complexity of software module upgrades
Benevol 2013: Visualizing the complexity of software module upgrades
 
Insulin
InsulinInsulin
Insulin
 
865 Project presentation
865 Project presentation865 Project presentation
865 Project presentation
 
Mo E Training 00 Welcome
Mo E Training   00   WelcomeMo E Training   00   Welcome
Mo E Training 00 Welcome
 
Regreso A Clase
Regreso A ClaseRegreso A Clase
Regreso A Clase
 
English Flip Chart 2010
English Flip Chart 2010English Flip Chart 2010
English Flip Chart 2010
 
Roman Vorobyev
Roman VorobyevRoman Vorobyev
Roman Vorobyev
 
System7 Five Point
System7 Five PointSystem7 Five Point
System7 Five Point
 
Databases Part 3: Searching
Databases Part 3: SearchingDatabases Part 3: Searching
Databases Part 3: Searching
 
Plan van aanpak Documentaire
Plan van aanpak DocumentairePlan van aanpak Documentaire
Plan van aanpak Documentaire
 
Invited Talk MESOCA 2014: Evolving software systems: emerging trends and chal...
Invited Talk MESOCA 2014: Evolving software systems: emerging trends and chal...Invited Talk MESOCA 2014: Evolving software systems: emerging trends and chal...
Invited Talk MESOCA 2014: Evolving software systems: emerging trends and chal...
 
TTT
TTTTTT
TTT
 
Flowgen: Flowchart-Based Documentation Framework for C++
Flowgen: Flowchart-Based Documentation Framework for C++Flowgen: Flowchart-Based Documentation Framework for C++
Flowgen: Flowchart-Based Documentation Framework for C++
 
865 social capital
865 social capital865 social capital
865 social capital
 
Arts & Crafts Expo
Arts & Crafts ExpoArts & Crafts Expo
Arts & Crafts Expo
 
Hudozgnik24
Hudozgnik24Hudozgnik24
Hudozgnik24
 
Fresh Produce
Fresh ProduceFresh Produce
Fresh Produce
 
Saxony Germany
Saxony GermanySaxony Germany
Saxony Germany
 

Similaire à Icsm 2011 you can't control the unfamiliar

Computer Supported Collaborative Learning and its impact on the Quality of St...
Computer Supported Collaborative Learning and its impact on the Quality of St...Computer Supported Collaborative Learning and its impact on the Quality of St...
Computer Supported Collaborative Learning and its impact on the Quality of St...Martin Rehm
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetseSAT Publishing House
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetseSAT Journals
 
Icam present silvia
Icam present silviaIcam present silvia
Icam present silviasferna
 
Effective simplicity rotterdam
Effective simplicity rotterdamEffective simplicity rotterdam
Effective simplicity rotterdamsaskiamenkel
 
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...Latvijas Banka
 
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise DataConstructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise Dataronicky
 
Lesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyLesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyPerla Pelicano Corpez
 
Services for Later Life: Are we any closer to integrating health and social c...
Services for Later Life: Are we any closer to integrating health and social c...Services for Later Life: Are we any closer to integrating health and social c...
Services for Later Life: Are we any closer to integrating health and social c...Age UK
 
hb2s5_BSc scriptie Steyn Heskes
hb2s5_BSc scriptie Steyn Heskeshb2s5_BSc scriptie Steyn Heskes
hb2s5_BSc scriptie Steyn HeskesSteyn Heskes
 
MAD 09 Presentation 2022 MMMM.ppt DR CHARLES (1).ppt
MAD 09 Presentation 2022 MMMM.ppt DR CHARLES (1).pptMAD 09 Presentation 2022 MMMM.ppt DR CHARLES (1).ppt
MAD 09 Presentation 2022 MMMM.ppt DR CHARLES (1).pptEmmanuelStevenKoroma
 
Types of cost ppt @ mba 2009
Types of cost ppt @ mba 2009Types of cost ppt @ mba 2009
Types of cost ppt @ mba 2009Babasab Patil
 
Modeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender SystemsModeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender Systemskib_83
 
2013 wp evidence-creation-through-knowledge-integration_ho
2013 wp evidence-creation-through-knowledge-integration_ho2013 wp evidence-creation-through-knowledge-integration_ho
2013 wp evidence-creation-through-knowledge-integration_hoWenny Ho
 
Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...
Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...
Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...ILRI
 
Doctoral dissertation presentation 2014
Doctoral dissertation presentation 2014Doctoral dissertation presentation 2014
Doctoral dissertation presentation 2014Xabier Alberdi
 
02. predicting financial distress logit mode jones
02. predicting financial distress logit mode jones02. predicting financial distress logit mode jones
02. predicting financial distress logit mode jonesSailendra Nangadam
 
Theory and evaluation metrics for learning disentangled representations
Theory and evaluation metrics for learning disentangled representationsTheory and evaluation metrics for learning disentangled representations
Theory and evaluation metrics for learning disentangled representationsKien Duc Do
 

Similaire à Icsm 2011 you can't control the unfamiliar (20)

Computer Supported Collaborative Learning and its impact on the Quality of St...
Computer Supported Collaborative Learning and its impact on the Quality of St...Computer Supported Collaborative Learning and its impact on the Quality of St...
Computer Supported Collaborative Learning and its impact on the Quality of St...
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
Icam present silvia
Icam present silviaIcam present silvia
Icam present silvia
 
Effective simplicity rotterdam
Effective simplicity rotterdamEffective simplicity rotterdam
Effective simplicity rotterdam
 
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
 
CBU, Economics
CBU, EconomicsCBU, Economics
CBU, Economics
 
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise DataConstructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
 
Lesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyLesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendency
 
Services for Later Life: Are we any closer to integrating health and social c...
Services for Later Life: Are we any closer to integrating health and social c...Services for Later Life: Are we any closer to integrating health and social c...
Services for Later Life: Are we any closer to integrating health and social c...
 
hb2s5_BSc scriptie Steyn Heskes
hb2s5_BSc scriptie Steyn Heskeshb2s5_BSc scriptie Steyn Heskes
hb2s5_BSc scriptie Steyn Heskes
 
MAD 09 Presentation 2022 MMMM.ppt DR CHARLES (1).ppt
MAD 09 Presentation 2022 MMMM.ppt DR CHARLES (1).pptMAD 09 Presentation 2022 MMMM.ppt DR CHARLES (1).ppt
MAD 09 Presentation 2022 MMMM.ppt DR CHARLES (1).ppt
 
Types of cost ppt @ mba 2009
Types of cost ppt @ mba 2009Types of cost ppt @ mba 2009
Types of cost ppt @ mba 2009
 
Modeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender SystemsModeling Difficulty in Recommender Systems
Modeling Difficulty in Recommender Systems
 
2013 wp evidence-creation-through-knowledge-integration_ho
2013 wp evidence-creation-through-knowledge-integration_ho2013 wp evidence-creation-through-knowledge-integration_ho
2013 wp evidence-creation-through-knowledge-integration_ho
 
Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...
Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...
Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...
 
Doctoral dissertation presentation 2014
Doctoral dissertation presentation 2014Doctoral dissertation presentation 2014
Doctoral dissertation presentation 2014
 
02. predicting financial distress logit mode jones
02. predicting financial distress logit mode jones02. predicting financial distress logit mode jones
02. predicting financial distress logit mode jones
 
49417273
4941727349417273
49417273
 
Theory and evaluation metrics for learning disentangled representations
Theory and evaluation metrics for learning disentangled representationsTheory and evaluation metrics for learning disentangled representations
Theory and evaluation metrics for learning disentangled representations
 

Plus de Alexander Serebrenik

Software development is a human activity: understanding software requires und...
Software development is a human activity: understanding software requires und...Software development is a human activity: understanding software requires und...
Software development is a human activity: understanding software requires und...Alexander Serebrenik
 
Towards Continuous Performance Assessment of Java Applications With PerfBot
Towards Continuous Performance Assessment of Java Applications With PerfBotTowards Continuous Performance Assessment of Java Applications With PerfBot
Towards Continuous Performance Assessment of Java Applications With PerfBotAlexander Serebrenik
 
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...Alexander Serebrenik
 
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...Alexander Serebrenik
 
Emotion Analysis in Software Ecosystems
Emotion Analysis in Software EcosystemsEmotion Analysis in Software Ecosystems
Emotion Analysis in Software EcosystemsAlexander Serebrenik
 
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...Alexander Serebrenik
 
Gender and Age in Software Engineering
Gender and Age in Software EngineeringGender and Age in Software Engineering
Gender and Age in Software EngineeringAlexander Serebrenik
 
Diversity and inclusion in a CS classroom
Diversity and inclusion in a CS classroomDiversity and inclusion in a CS classroom
Diversity and inclusion in a CS classroomAlexander Serebrenik
 
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis AlarmsAn Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis AlarmsAlexander Serebrenik
 
Classification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis AlarmsClassification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis AlarmsAlexander Serebrenik
 
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The NetherlandsWhat Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The NetherlandsAlexander Serebrenik
 
From team organisation to software quality
From team organisation to software qualityFrom team organisation to software quality
From team organisation to software qualityAlexander Serebrenik
 
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...Alexander Serebrenik
 
My research story (presentation at ICSE 2021 New Faculty Symposium)
My research story (presentation at ICSE 2021 New Faculty Symposium)My research story (presentation at ICSE 2021 New Faculty Symposium)
My research story (presentation at ICSE 2021 New Faculty Symposium)Alexander Serebrenik
 
Opinion Mining for Software Engineering
Opinion Mining for Software EngineeringOpinion Mining for Software Engineering
Opinion Mining for Software EngineeringAlexander Serebrenik
 
Removing Self Admitted Technical Debt
Removing Self Admitted Technical DebtRemoving Self Admitted Technical Debt
Removing Self Admitted Technical DebtAlexander Serebrenik
 
Gender Diversity and Inclusion and Software Engineering
Gender Diversity and Inclusion and Software EngineeringGender Diversity and Inclusion and Software Engineering
Gender Diversity and Inclusion and Software EngineeringAlexander Serebrenik
 

Plus de Alexander Serebrenik (20)

Software development is a human activity: understanding software requires und...
Software development is a human activity: understanding software requires und...Software development is a human activity: understanding software requires und...
Software development is a human activity: understanding software requires und...
 
Towards Continuous Performance Assessment of Java Applications With PerfBot
Towards Continuous Performance Assessment of Java Applications With PerfBotTowards Continuous Performance Assessment of Java Applications With PerfBot
Towards Continuous Performance Assessment of Java Applications With PerfBot
 
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...
 
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...
 
Emotion Analysis in Software Ecosystems
Emotion Analysis in Software EcosystemsEmotion Analysis in Software Ecosystems
Emotion Analysis in Software Ecosystems
 
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...
 
Gender and Age in Software Engineering
Gender and Age in Software EngineeringGender and Age in Software Engineering
Gender and Age in Software Engineering
 
Alexander - intro
Alexander - introAlexander - intro
Alexander - intro
 
Diversity and inclusion in a CS classroom
Diversity and inclusion in a CS classroomDiversity and inclusion in a CS classroom
Diversity and inclusion in a CS classroom
 
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis AlarmsAn Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms
 
Classification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis AlarmsClassification and Ranking of Delta Static Analysis Alarms
Classification and Ranking of Delta Static Analysis Alarms
 
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The NetherlandsWhat Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands
 
Gender and Community Smells
Gender and Community SmellsGender and Community Smells
Gender and Community Smells
 
Bias in MSR Research
Bias in MSR ResearchBias in MSR Research
Bias in MSR Research
 
From team organisation to software quality
From team organisation to software qualityFrom team organisation to software quality
From team organisation to software quality
 
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...
 
My research story (presentation at ICSE 2021 New Faculty Symposium)
My research story (presentation at ICSE 2021 New Faculty Symposium)My research story (presentation at ICSE 2021 New Faculty Symposium)
My research story (presentation at ICSE 2021 New Faculty Symposium)
 
Opinion Mining for Software Engineering
Opinion Mining for Software EngineeringOpinion Mining for Software Engineering
Opinion Mining for Software Engineering
 
Removing Self Admitted Technical Debt
Removing Self Admitted Technical DebtRemoving Self Admitted Technical Debt
Removing Self Admitted Technical Debt
 
Gender Diversity and Inclusion and Software Engineering
Gender Diversity and Inclusion and Software EngineeringGender Diversity and Inclusion and Software Engineering
Gender Diversity and Inclusion and Software Engineering
 

Icsm 2011 you can't control the unfamiliar

  • 1. Metrics are usually computed at a low level: classes, methods, … / W&I / MDSE 3-11-2012 PAGE 0
  • 2. Multitude of data values obscures a general picture of the system maintainability /W&I / MDSE 3-11-2012 PAGE 1
  • 3. That we are actually interested in! /W&I / MDSE 3-11-2012 PAGE 2
  • 4. You Can't Control the Unfamiliar: A Study on the Relations Between Aggregation Techniques for Software Metrics Bogdan Vasilescu Alexander Serebrenik Mark van den Brand
  • 5. Two kinds of aggregation Same metrics, different Same artifact, different artifacts metrics /W&I / MDSE 3-11-2012 PAGE 4
  • 6. Various techniques can be found in the literature Same metrics, different Traditional: mean, artifacts median, sum, … Econometric inequality indices: Gini, Theil, Hoover, Kolm, Atkinson /W&I / MDSE 3-11-2012 PAGE 5
  • 7. Various techniques can be found in the literature Same metrics, different Traditional: mean, artifacts median, sum, … Which aggregation Econometric technique inequality indices: Gini, Theil, Hoover, should we Kolm, Atkinson use? /W&I / MDSE 3-11-2012 PAGE 6
  • 8. Questions 1. Which and to what extent do the different aggregation techniques agree? 2. What is the nature of the relation between the various aggregation techniques? 3. How does the correlation coefficient change as the systems evolve? /W&I / MDSE 3-11-2012 PAGE 7
  • 9. Qualitas Corpus 20101126 • Qualitas Corpus 20101126r, 106 systems • FitJava v1.1, 2 packages, 2240 SLOC • NetBeans v6.9.1, 3373 packages 1890536 SLOC. /W&I / MDSE 3-11-2012 PAGE 8
  • 10. 1) Agreement between diff techniques • Agreement: • Aggregation: Class SLOC  Package • Techniques agree if they rank the packages similarly We use rank-based correlation coefficient: Kendall’s  /W&I / MDSE 3-11-2012 PAGE 9
  • 11. 1) Agreement: different inequality indices? • Gini, Theil, Hoover, Atkinson – agree • aggregates obtained convey the same information • Kolm does not! /W&I / MDSE 3-11-2012 PAGE 10
  • 12. 1) Agreement: traditional and ineq indices? • mean • Kolm: strong (0,8) and statistically significant (92%) • median, standard deviation, and variance • sum • does not correlate with any other aggregation technique /W&I / MDSE 3-11-2012 PAGE 11
  • 13. 2) Nature of the relation: Typical patterns • Theil is known to be more • Linear relation with a “fat” sensitive to the rich head • Theil increases faster when Gini increases /W&I / MDSE 3-11-2012 PAGE 12
  • 14. Which aggregation technique? (1) • Theil, Hoover, Gini and Atkinson agree • Any can be chosen from the correlation point of view • Some might be “better” in each specific case • easy to interpret: Gini  [0,1] • provide additional insights: Theil (explanation) • negative values: Gini, Hoover − affects the domain! • sensitive for high values: Theil, Atkinson • deviations from uniformity: Gini, Hoover / W&I / MDSE 3-11-2012 PAGE 13
  • 15. Which aggregation technique? (2) • Kolm and mean agree • Kolm is reliable for skewed distributions − better alternative (“by no means”) • Not in the paper: − agreement observed for NOC − but not for DIT! / W&I / MDSE 3-11-2012 PAGE 14
  • 16. Conclusions /W&I / MDSE 3-11-2012 PAGE 15

Notes de l'éditeur

  1. Red line – mean, blue line – medianFurther approaches: distribution fitting, quality models (SIG, SQUALE)
  2. Red line – mean, blue line – medianFurther approaches: distribution fitting, quality models (SIG, SQUALE)
  3. % of the systems with statistically significant correlation between the corresponding indices at the 0.05 levelKendall correlation (rank based)