SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Metrics are usually computed at a low level:
           classes, methods, …




/ W&I / MDSE        5-10-2011 PAGE 0
Multitude of data values obscures a general
      picture of the system maintainability




/W&I / MDSE         5-10-2011 PAGE 1
That we are actually interested in!




/W&I / MDSE          5-10-2011 PAGE 2
You Can't Control the Unfamiliar:
A Study on the Relations
Between Aggregation
Techniques for Software Metrics

 Bogdan Vasilescu
 Alexander Serebrenik
 Mark van den Brand
Two kinds of aggregation
Same metrics, different                Same artifact, different
artifacts                              metrics




/W&I / MDSE         5-10-2011 PAGE 4
Various techniques can be
   found in the literature
Same metrics, different                  Traditional: mean,
artifacts                                median, sum, …



                                       Econometric
                                       inequality indices:
                                       Gini, Theil, Hoover,
                                       Kolm, Atkinson




/W&I / MDSE         5-10-2011 PAGE 5
Various techniques can be
   found in the literature
Same metrics, different                  Traditional: mean,
artifacts                                median, sum, …

                                            Which
                                       aggregation
                                       Econometric
                                         technique
                                       inequality indices:
                                       Gini, Theil, Hoover,
                                         should we
                                       Kolm, Atkinson
                                              use?


/W&I / MDSE         5-10-2011 PAGE 6
Questions

      1. Which and to what extent do the different
         aggregation techniques agree?

      2. What is the nature of the relation between the
         various aggregation techniques?

      3. How does the correlation coefficient change as the
         systems evolve?




/W&I / MDSE            5-10-2011 PAGE 7
Qualitas Corpus 20101126
     • Qualitas Corpus 20101126r, 106 systems
     • FitJava v1.1, 2 packages, 2240 SLOC
     • NetBeans v6.9.1, 3373 packages 1890536 SLOC.




/W&I / MDSE          5-10-2011 PAGE 8
1) Agreement between diff techniques

      • Agreement:
          • Aggregation: Class SLOC  Package
          • Techniques agree if they rank the packages similarly




        We use rank-based correlation coefficient: Kendall’s 


/W&I / MDSE               5-10-2011 PAGE 9
1) Agreement: different inequality indices?
     • Gini, Theil, Hoover, Atkinson – agree
         • aggregates obtained convey the same information
         • Kolm does not!




/W&I / MDSE              5-10-2011 PAGE 10
1) Agreement: traditional and ineq indices?

    • mean
        • Kolm: strong (0,8) and statistically significant (92%)
        • median, standard deviation, and variance


    • sum
        • does not correlate with any other aggregation technique




/W&I / MDSE               5-10-2011 PAGE 11
2) Nature of the relation: Typical patterns




   • Theil is known to be more           • Linear relation with a “fat”
     sensitive to the rich                 head
   • Theil increases faster
     when Gini increases
/W&I / MDSE          5-10-2011 PAGE 12
Which aggregation technique? (1)

      • Theil, Hoover, Gini and Atkinson agree
          • Any can be chosen from the correlation point of view

      • Some might be “better” in each specific case
          • easy to interpret: Gini  [0,1]
          • provide additional insights: Theil (explanation)
          • negative values: Gini, Hoover
               − affects the domain!
          • sensitive for high values: Theil, Atkinson
          • deviations from uniformity: Gini, Hoover




/ W&I / MDSE                   5-10-2011 PAGE 13
Which aggregation technique? (2)

      • Kolm and mean agree
          • Kolm is reliable for skewed distributions
            − better alternative (“by no means”)
          • Not in the paper:
            − agreement observed for NOC
            − but not for DIT!




/ W&I / MDSE               5-10-2011 PAGE 14
Conclusions




/W&I / MDSE         5-10-2011 PAGE 15

Contenu connexe

En vedette

ICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer KoschkeICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer Koschke
ICSM 2011
 

En vedette (20)

Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java InterfacesMetrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
Metrics - Using Source Code Metrics to Predict Change-Prone Java Interfaces
 
Reliability and Quality - Predicting post-release defects using pre-release f...
Reliability and Quality - Predicting post-release defects using pre-release f...Reliability and Quality - Predicting post-release defects using pre-release f...
Reliability and Quality - Predicting post-release defects using pre-release f...
 
ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild ERA - Measuring Maintainability of Spreadsheets in the Wild
ERA - Measuring Maintainability of Spreadsheets in the Wild
 
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
 
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
ERA Poster - Measuring Disruption from Software Evolution Activities Using Gr...
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to Task
 
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationImpact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
 
Postdoc Symposium - Abram Hindle
Postdoc Symposium - Abram HindlePostdoc Symposium - Abram Hindle
Postdoc Symposium - Abram Hindle
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...
 
ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical Debt
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...
 
ICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer KoschkeICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer Koschke
 
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
 
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
Industry - Precise Detection of Un-Initialized Variables in Large, Real-life ...
 
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
 
Natural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsNatural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming Conventions
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects
 
Components - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API LimitationsComponents - Graph Based Detection of Library API Limitations
Components - Graph Based Detection of Library API Limitations
 

Similaire à Metrics - You can't control the unfamiliar

Icsm 2011 you can't control the unfamiliar
Icsm 2011 you can't control the unfamiliarIcsm 2011 you can't control the unfamiliar
Icsm 2011 you can't control the unfamiliar
Alexander Serebrenik
 
Computer Supported Collaborative Learning and its impact on the Quality of St...
Computer Supported Collaborative Learning and its impact on the Quality of St...Computer Supported Collaborative Learning and its impact on the Quality of St...
Computer Supported Collaborative Learning and its impact on the Quality of St...
Martin Rehm
 
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise DataConstructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
ronicky
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
eSAT Publishing House
 
Theory and evaluation metrics for learning disentangled representations
Theory and evaluation metrics for learning disentangled representationsTheory and evaluation metrics for learning disentangled representations
Theory and evaluation metrics for learning disentangled representations
Kien Duc Do
 
Icam present silvia
Icam present silviaIcam present silvia
Icam present silvia
sferna
 
Reflection paper NO PLAGIARISM TIMES NEW ROMAN FONT. DO NOT U.docx
Reflection paper NO PLAGIARISM  TIMES NEW ROMAN FONT.  DO NOT U.docxReflection paper NO PLAGIARISM  TIMES NEW ROMAN FONT.  DO NOT U.docx
Reflection paper NO PLAGIARISM TIMES NEW ROMAN FONT. DO NOT U.docx
lillie234567
 
CURRENT EXPERIENCES AND TRENDS IN THE REFORM OF
CURRENT EXPERIENCES AND TRENDS IN THE REFORM OFCURRENT EXPERIENCES AND TRENDS IN THE REFORM OF
CURRENT EXPERIENCES AND TRENDS IN THE REFORM OF
Anne Van Lancker
 

Similaire à Metrics - You can't control the unfamiliar (20)

Icsm 2011 you can't control the unfamiliar
Icsm 2011 you can't control the unfamiliarIcsm 2011 you can't control the unfamiliar
Icsm 2011 you can't control the unfamiliar
 
Computer Supported Collaborative Learning and its impact on the Quality of St...
Computer Supported Collaborative Learning and its impact on the Quality of St...Computer Supported Collaborative Learning and its impact on the Quality of St...
Computer Supported Collaborative Learning and its impact on the Quality of St...
 
Effective simplicity rotterdam
Effective simplicity rotterdamEffective simplicity rotterdam
Effective simplicity rotterdam
 
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise DataConstructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
 
CBU, Economics
CBU, EconomicsCBU, Economics
CBU, Economics
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
Discussion of “Anatomy of sovereign distress: The role of financial sector fr...
 
Improving the disability assessment and social protection system in Italy - I...
Improving the disability assessment and social protection system in Italy - I...Improving the disability assessment and social protection system in Italy - I...
Improving the disability assessment and social protection system in Italy - I...
 
Theory and evaluation metrics for learning disentangled representations
Theory and evaluation metrics for learning disentangled representationsTheory and evaluation metrics for learning disentangled representations
Theory and evaluation metrics for learning disentangled representations
 
Icam present silvia
Icam present silviaIcam present silvia
Icam present silvia
 
Reflection paper NO PLAGIARISM TIMES NEW ROMAN FONT. DO NOT U.docx
Reflection paper NO PLAGIARISM  TIMES NEW ROMAN FONT.  DO NOT U.docxReflection paper NO PLAGIARISM  TIMES NEW ROMAN FONT.  DO NOT U.docx
Reflection paper NO PLAGIARISM TIMES NEW ROMAN FONT. DO NOT U.docx
 
Types of cost ppt @ mba 2009
Types of cost ppt @ mba 2009Types of cost ppt @ mba 2009
Types of cost ppt @ mba 2009
 
Services for Later Life: Are we any closer to integrating health and social c...
Services for Later Life: Are we any closer to integrating health and social c...Services for Later Life: Are we any closer to integrating health and social c...
Services for Later Life: Are we any closer to integrating health and social c...
 
International summer school (July 2012, Enschede)
International summer school (July 2012, Enschede)International summer school (July 2012, Enschede)
International summer school (July 2012, Enschede)
 
49417273
4941727349417273
49417273
 
Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...
Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...
Technical Consortium Theme 3: Monitoring and evaluation for enhanced resilien...
 
Mirage hh dakar_december2011_0
Mirage hh dakar_december2011_0Mirage hh dakar_december2011_0
Mirage hh dakar_december2011_0
 
Heterodox Economics survey
Heterodox Economics surveyHeterodox Economics survey
Heterodox Economics survey
 
CURRENT EXPERIENCES AND TRENDS IN THE REFORM OF
CURRENT EXPERIENCES AND TRENDS IN THE REFORM OFCURRENT EXPERIENCES AND TRENDS IN THE REFORM OF
CURRENT EXPERIENCES AND TRENDS IN THE REFORM OF
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Metrics - You can't control the unfamiliar

  • 1. Metrics are usually computed at a low level: classes, methods, … / W&I / MDSE 5-10-2011 PAGE 0
  • 2. Multitude of data values obscures a general picture of the system maintainability /W&I / MDSE 5-10-2011 PAGE 1
  • 3. That we are actually interested in! /W&I / MDSE 5-10-2011 PAGE 2
  • 4. You Can't Control the Unfamiliar: A Study on the Relations Between Aggregation Techniques for Software Metrics Bogdan Vasilescu Alexander Serebrenik Mark van den Brand
  • 5. Two kinds of aggregation Same metrics, different Same artifact, different artifacts metrics /W&I / MDSE 5-10-2011 PAGE 4
  • 6. Various techniques can be found in the literature Same metrics, different Traditional: mean, artifacts median, sum, … Econometric inequality indices: Gini, Theil, Hoover, Kolm, Atkinson /W&I / MDSE 5-10-2011 PAGE 5
  • 7. Various techniques can be found in the literature Same metrics, different Traditional: mean, artifacts median, sum, … Which aggregation Econometric technique inequality indices: Gini, Theil, Hoover, should we Kolm, Atkinson use? /W&I / MDSE 5-10-2011 PAGE 6
  • 8. Questions 1. Which and to what extent do the different aggregation techniques agree? 2. What is the nature of the relation between the various aggregation techniques? 3. How does the correlation coefficient change as the systems evolve? /W&I / MDSE 5-10-2011 PAGE 7
  • 9. Qualitas Corpus 20101126 • Qualitas Corpus 20101126r, 106 systems • FitJava v1.1, 2 packages, 2240 SLOC • NetBeans v6.9.1, 3373 packages 1890536 SLOC. /W&I / MDSE 5-10-2011 PAGE 8
  • 10. 1) Agreement between diff techniques • Agreement: • Aggregation: Class SLOC  Package • Techniques agree if they rank the packages similarly We use rank-based correlation coefficient: Kendall’s  /W&I / MDSE 5-10-2011 PAGE 9
  • 11. 1) Agreement: different inequality indices? • Gini, Theil, Hoover, Atkinson – agree • aggregates obtained convey the same information • Kolm does not! /W&I / MDSE 5-10-2011 PAGE 10
  • 12. 1) Agreement: traditional and ineq indices? • mean • Kolm: strong (0,8) and statistically significant (92%) • median, standard deviation, and variance • sum • does not correlate with any other aggregation technique /W&I / MDSE 5-10-2011 PAGE 11
  • 13. 2) Nature of the relation: Typical patterns • Theil is known to be more • Linear relation with a “fat” sensitive to the rich head • Theil increases faster when Gini increases /W&I / MDSE 5-10-2011 PAGE 12
  • 14. Which aggregation technique? (1) • Theil, Hoover, Gini and Atkinson agree • Any can be chosen from the correlation point of view • Some might be “better” in each specific case • easy to interpret: Gini  [0,1] • provide additional insights: Theil (explanation) • negative values: Gini, Hoover − affects the domain! • sensitive for high values: Theil, Atkinson • deviations from uniformity: Gini, Hoover / W&I / MDSE 5-10-2011 PAGE 13
  • 15. Which aggregation technique? (2) • Kolm and mean agree • Kolm is reliable for skewed distributions − better alternative (“by no means”) • Not in the paper: − agreement observed for NOC − but not for DIT! / W&I / MDSE 5-10-2011 PAGE 14
  • 16. Conclusions /W&I / MDSE 5-10-2011 PAGE 15