SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Evaluating the
  presence and impact
  of bias in bug-fix
  datasets
  Israel Herraiz, UPM
  http://mat.caminos.upm.es/~iht
  Talk at University of California,
  Davis
  April 11 2012
                                This presentation is available at
http://www.slideshare.net/herraiz/evaluating-the-presence-and-impact-of-bias-in-bugfix-datasets
Outline

     1.    Who am I and what do I do
     2.    The problem
     3.    Preliminary results
     4.    The road ahead
     5.    Take away and discussion




http://mat.caminos.upm.es/~iht             1 / 34
1. Who am I and what do I do


http://mat.caminos.upm.es/~iht         2 / 34
About me

     • PhD on Computer Science from Universidad
       Rey Juan Carlos (Madrid)
                    •    “A statistical examination of the evolution and properties
                         of libre software”
                    •    http://herraiz.org/phd.html
     • Assistant Professor at the Technical University
       of Madrid
                    •    http://mat.caminos.upm.es/~iht
     • Visiting UC Davis from April to July hosted by
       Prof. Devanbu
                    •    Kindly funded by a MECD “José Castillejo” grant
                         (JC2011-0093)
http://mat.caminos.upm.es/~iht                                                    3 / 34
What do I do?




http://mat.caminos.upm.es/~iht                   4 / 34
2. The problem


http://mat.caminos.upm.es/~iht   5 / 34
Replication in Empirical Software Engineering



                                    Empirical Software Engineering studies
                                    are hard to replicate.

                                    Verification and replication are crucial
                                    features of an empirical research
                                    discipline.

                                    Reusable datasets lower the barrier for
                                    replication.




http://mat.caminos.upm.es/~iht                                                 6 / 34
Reusable datasets




           FLOSSMole




http://mat.caminos.upm.es/~iht                       7 / 34
The case of the Eclipse dataset
                                            Defects data for all packages in the releases
                                            2.0, 2.1 and 3.0

                                            Size and complexity metrics for all the files




                   http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/

http://mat.caminos.upm.es/~iht                                                              8 / 34
Bug-fix datasets

     • The Eclipse data is a bug-fix dataset
     • To cross correlate bugs with files, classes or
       packages, the data is extracted from
               •    Bug tracking systems (fixed bug reports)
               •    Version control system (commits)
     • Heuristics to detect relationships between bug-
       fix reports and commits




http://mat.caminos.upm.es/~iht                                 9 / 34
A study using the Eclipse dataset




http://mat.caminos.upm.es/~iht                              10 / 34
The distribution of software faults

     • The distribution of software faults (over
       packages) is a Weibull distribution
     • This study can be easily replicated thanks to the
       Eclipse reusable bug-fix dataset
     • If the same data is obtained for other case
       studies, it can also be easily verified and
       extended




http://mat.caminos.upm.es/~iht                                11 / 34
But…




http://mat.caminos.upm.es/~iht          12 / 34
What’s the difference between the two conflicting
                               studies?
     • According   to     the   authors             there        are
       methodological differences
          •     Zhang uses Alberg diagrams
          •     Concas et al. use CCDF plots to fit different
                distributions, and reason about the generative
                process as a model for software maintenance
     • What I suspect is a crucial difference
          •     Zhang reused the Eclipse bug-fix dataset
          •     Concas et al. gathered the data by themselves
          •     So the bias in both datasets will be different
http://mat.caminos.upm.es/~iht                                    13 / 34
What’s wrong with the Eclipse bug-fix dataset?




http://mat.caminos.upm.es/~iht                                14 / 34
Bug feature bias




    There are other kind of bias (commit features), but in the case of the two
    Eclipse papers, the distribution is about packages features, not bugs
    neither commits features.

    RQ1: Will this kind of bias hold for packages / classes / files
    features?
    RQ2: What’s the impact on defect prediction?
http://mat.caminos.upm.es/~iht                                               15 / 34
Impact on prediction




http://mat.caminos.upm.es/~iht                          16 / 34
Impact on prediction

     J48 tree to classify files as defective or not




http://mat.caminos.upm.es/~iht                          17 / 34
Conclusions so far

     •    Developers only mark a subset of the bug-fix pairs,
          and so heuristics-based recovery methods only find
          a subset of the overall bug-fix pairs
     •    The bias appears as a difference in the distribution
          of bugs and commits features
     •    The conflict between the two studies about the
          distribution of bugs in Eclipse is likely to be due to
          differences in the distributions caused by bias
     •    The bias has a great impact on the accuracy of
          predictor models

http://mat.caminos.upm.es/~iht                                18 / 34
3. Preliminary results


http://mat.caminos.upm.es/~iht   19 / 34
The distribution of bugs over files

     • Number of bugs per file for the case of Zxing




http://mat.caminos.upm.es/~iht                                20 / 34
The distribution of bugs over files

     • Number of bugs per file for the case of Eclipse




http://mat.caminos.upm.es/~iht                                21 / 34
The distribution of bugs over files

     • Comparison between the ReLink and the biased
       bug-fix sets (results of the χ2 test, p-values)




http://mat.caminos.upm.es/~iht                                22 / 34
The distribution of bugs over files

     • Comparison between the ReLink and the biased
       bug-fix sets (results of the χ2 test, p-values)




                    RQ1: Will this kind of bias hold for packages /
                               classes / files features?

                                 Not supported by these examples
http://mat.caminos.upm.es/~iht                                        23 / 34
Time over!

     • So there is no difference between the biased
       and non-biased datasets?
     • And how come the ReLink paper (and others)
       report improved accuracies when using the non-
       biased datasets?
     • What could explain these differences?




http://mat.caminos.upm.es/~iht                     24 / 34
Impact on prediction accuracy

     • What is the prediction accuracy using different
       (biased and non-biased) datasets?
     • Three datasets
               •    Biased datasets recovered using heuristics
               •    “Golden” dataset manually recovered
                    •    By Sung Kim et al., not me!
               •    Non-biased dataset obtained using the ReLink
                    tool
     • J48 tree classifier, 10 folds cross validation
               •    Test datasets always extracted from the golden
                    dataset
http://mat.caminos.upm.es/~iht                                       25 / 34
F-measure values

     • Procedure
          •     Extract 100 subsamples of the same size for
                both datasets
          •     Calculate F-measure using a 10 folds cross
                validation
               •    The test set is always extracted from the “golden”
                    set
     • Repeat for several subsample sizes
     • Only results for the case of OpenIntents so far


http://mat.caminos.upm.es/~iht                                       26 / 34
http://mat.caminos.upm.es/~iht   27 / 34
RQ2: Impact on prediction

                           Not clear whether there is any impact


http://mat.caminos.upm.es/~iht                                     28 / 34
Little warning!

                                         The size is not exactly the same for
                                         the three cases in each boxplot.
                                         The biased is always the smallest
                                         of the three.
                                 RQ2: Impact on prediction
                                       I have to repeat this using exactly
                                       the same size for the three
                           Not clear whether there is any impact
                                       datasets.


http://mat.caminos.upm.es/~iht                                                  29 / 34
Preliminary conclusions

     • The biased dataset does not provide the worst
       accuracy when predicting fault proneness for a
       set of (supposedly) unbiased bug fixes and files
               •    Contrarily to what is reported in previous work
     • What is the cause of the reported differences in
       accuracy?
          •     By definition, the size of the so-called biased
                dataset will be always smaller
          •     Dataset size does have an impact on the F-
                measure

http://mat.caminos.upm.es/~iht                                        30 / 34
4. The road ahead


http://mat.caminos.upm.es/~iht   31 / 34
My workplan at UC Davis

     • Discuss the ideas shown here
          •     Is bias really a problem for defect prediction?
     • Extend the study to more cases
          •     Do you have a dataset of files, bugs, commits,
                metrics? Please let me know!
     • Improve the study
          •     What happens if we break down the data in more
                coherent subgroups
     • Do the results change at different levels of
       granularity?
http://mat.caminos.upm.es/~iht                                    32 / 34
5. Take away and conclusions


http://mat.caminos.upm.es/~iht         33 / 34
No observable
   Systematic difference            difference in the
   in bug-fixes collected        statistical properties of
        by heuristics             the so-called biased
                                          dataset


                                  Ecological inference
     Impact on prediction
                                   What happens at
      accuracy not clear
                                     other scales?
                                 With other subgroups?
http://mat.caminos.upm.es/~iht                         34 / 34

Contenu connexe

Similaire à Evaluating the presence and impact of bias in bug-fix datasets

Using Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience InfrastructureUsing Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience InfrastructureDaniel S. Katz
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
 
Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyNeil Chue Hong
 
OpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
OpenRepGrid – An Open Source Software for the Analysis of Repertory GridsOpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
OpenRepGrid – An Open Source Software for the Analysis of Repertory GridsMark Heckmann
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationTanu Malik
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015Jim Belak
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Timothy Danford
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsAnubhav Jain
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeXin Ye
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Tao He
 
Curation-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific ResearcherCuration-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific Researcherbwestra
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesAnnika Eriksson
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingGigaScience, BGI Hong Kong
 
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...Chunlei Wu
 

Similaire à Evaluating the presence and impact of bias in bug-fix datasets (20)

Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
Using Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience InfrastructureUsing Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience Infrastructure
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & Sociology
 
OpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
OpenRepGrid – An Open Source Software for the Analysis of Repertory GridsOpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
OpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database Virtualization
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
 
Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?Why is Bioinformatics a Good Fit for Spark?
Why is Bioinformatics a Good Fit for Spark?
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?Concrete meta research - how to collect, manage, and read papers?
Concrete meta research - how to collect, manage, and read papers?
 
Curation-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific ResearcherCuration-Friendly Tools for the Scientific Researcher
Curation-Friendly Tools for the Scientific Researcher
 
CSMR06b.ppt
CSMR06b.pptCSMR06b.ppt
CSMR06b.ppt
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
 

Plus de Israel Herraiz

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolutionIsrael Herraiz
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key CryptographyIsrael Herraiz
 
Statistical Distribution of Metrics
Statistical Distribution of MetricsStatistical Distribution of Metrics
Statistical Distribution of MetricsIsrael Herraiz
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPMIsrael Herraiz
 
The Ultimate Debian Database
The Ultimate Debian DatabaseThe Ultimate Debian Database
The Ultimate Debian DatabaseIsrael Herraiz
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costIsrael Herraiz
 
The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011Israel Herraiz
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptographyIsrael Herraiz
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 

Plus de Israel Herraiz (9)

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolution
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
Statistical Distribution of Metrics
Statistical Distribution of MetricsStatistical Distribution of Metrics
Statistical Distribution of Metrics
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM
 
The Ultimate Debian Database
The Ultimate Debian DatabaseThe Ultimate Debian Database
The Ultimate Debian Database
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software cost
 
The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011The dynamics of software evolution - EVOLUMONS 2011
The dynamics of software evolution - EVOLUMONS 2011
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptography
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 

Dernier

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Dernier (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Evaluating the presence and impact of bias in bug-fix datasets

  • 1. Evaluating the presence and impact of bias in bug-fix datasets Israel Herraiz, UPM http://mat.caminos.upm.es/~iht Talk at University of California, Davis April 11 2012 This presentation is available at http://www.slideshare.net/herraiz/evaluating-the-presence-and-impact-of-bias-in-bugfix-datasets
  • 2. Outline 1. Who am I and what do I do 2. The problem 3. Preliminary results 4. The road ahead 5. Take away and discussion http://mat.caminos.upm.es/~iht 1 / 34
  • 3. 1. Who am I and what do I do http://mat.caminos.upm.es/~iht 2 / 34
  • 4. About me • PhD on Computer Science from Universidad Rey Juan Carlos (Madrid) • “A statistical examination of the evolution and properties of libre software” • http://herraiz.org/phd.html • Assistant Professor at the Technical University of Madrid • http://mat.caminos.upm.es/~iht • Visiting UC Davis from April to July hosted by Prof. Devanbu • Kindly funded by a MECD “José Castillejo” grant (JC2011-0093) http://mat.caminos.upm.es/~iht 3 / 34
  • 5. What do I do? http://mat.caminos.upm.es/~iht 4 / 34
  • 7. Replication in Empirical Software Engineering Empirical Software Engineering studies are hard to replicate. Verification and replication are crucial features of an empirical research discipline. Reusable datasets lower the barrier for replication. http://mat.caminos.upm.es/~iht 6 / 34
  • 8. Reusable datasets FLOSSMole http://mat.caminos.upm.es/~iht 7 / 34
  • 9. The case of the Eclipse dataset Defects data for all packages in the releases 2.0, 2.1 and 3.0 Size and complexity metrics for all the files http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/ http://mat.caminos.upm.es/~iht 8 / 34
  • 10. Bug-fix datasets • The Eclipse data is a bug-fix dataset • To cross correlate bugs with files, classes or packages, the data is extracted from • Bug tracking systems (fixed bug reports) • Version control system (commits) • Heuristics to detect relationships between bug- fix reports and commits http://mat.caminos.upm.es/~iht 9 / 34
  • 11. A study using the Eclipse dataset http://mat.caminos.upm.es/~iht 10 / 34
  • 12. The distribution of software faults • The distribution of software faults (over packages) is a Weibull distribution • This study can be easily replicated thanks to the Eclipse reusable bug-fix dataset • If the same data is obtained for other case studies, it can also be easily verified and extended http://mat.caminos.upm.es/~iht 11 / 34
  • 14. What’s the difference between the two conflicting studies? • According to the authors there are methodological differences • Zhang uses Alberg diagrams • Concas et al. use CCDF plots to fit different distributions, and reason about the generative process as a model for software maintenance • What I suspect is a crucial difference • Zhang reused the Eclipse bug-fix dataset • Concas et al. gathered the data by themselves • So the bias in both datasets will be different http://mat.caminos.upm.es/~iht 13 / 34
  • 15. What’s wrong with the Eclipse bug-fix dataset? http://mat.caminos.upm.es/~iht 14 / 34
  • 16. Bug feature bias There are other kind of bias (commit features), but in the case of the two Eclipse papers, the distribution is about packages features, not bugs neither commits features. RQ1: Will this kind of bias hold for packages / classes / files features? RQ2: What’s the impact on defect prediction? http://mat.caminos.upm.es/~iht 15 / 34
  • 18. Impact on prediction J48 tree to classify files as defective or not http://mat.caminos.upm.es/~iht 17 / 34
  • 19. Conclusions so far • Developers only mark a subset of the bug-fix pairs, and so heuristics-based recovery methods only find a subset of the overall bug-fix pairs • The bias appears as a difference in the distribution of bugs and commits features • The conflict between the two studies about the distribution of bugs in Eclipse is likely to be due to differences in the distributions caused by bias • The bias has a great impact on the accuracy of predictor models http://mat.caminos.upm.es/~iht 18 / 34
  • 21. The distribution of bugs over files • Number of bugs per file for the case of Zxing http://mat.caminos.upm.es/~iht 20 / 34
  • 22. The distribution of bugs over files • Number of bugs per file for the case of Eclipse http://mat.caminos.upm.es/~iht 21 / 34
  • 23. The distribution of bugs over files • Comparison between the ReLink and the biased bug-fix sets (results of the χ2 test, p-values) http://mat.caminos.upm.es/~iht 22 / 34
  • 24. The distribution of bugs over files • Comparison between the ReLink and the biased bug-fix sets (results of the χ2 test, p-values) RQ1: Will this kind of bias hold for packages / classes / files features? Not supported by these examples http://mat.caminos.upm.es/~iht 23 / 34
  • 25. Time over! • So there is no difference between the biased and non-biased datasets? • And how come the ReLink paper (and others) report improved accuracies when using the non- biased datasets? • What could explain these differences? http://mat.caminos.upm.es/~iht 24 / 34
  • 26. Impact on prediction accuracy • What is the prediction accuracy using different (biased and non-biased) datasets? • Three datasets • Biased datasets recovered using heuristics • “Golden” dataset manually recovered • By Sung Kim et al., not me! • Non-biased dataset obtained using the ReLink tool • J48 tree classifier, 10 folds cross validation • Test datasets always extracted from the golden dataset http://mat.caminos.upm.es/~iht 25 / 34
  • 27. F-measure values • Procedure • Extract 100 subsamples of the same size for both datasets • Calculate F-measure using a 10 folds cross validation • The test set is always extracted from the “golden” set • Repeat for several subsample sizes • Only results for the case of OpenIntents so far http://mat.caminos.upm.es/~iht 26 / 34
  • 29. RQ2: Impact on prediction Not clear whether there is any impact http://mat.caminos.upm.es/~iht 28 / 34
  • 30. Little warning! The size is not exactly the same for the three cases in each boxplot. The biased is always the smallest of the three. RQ2: Impact on prediction I have to repeat this using exactly the same size for the three Not clear whether there is any impact datasets. http://mat.caminos.upm.es/~iht 29 / 34
  • 31. Preliminary conclusions • The biased dataset does not provide the worst accuracy when predicting fault proneness for a set of (supposedly) unbiased bug fixes and files • Contrarily to what is reported in previous work • What is the cause of the reported differences in accuracy? • By definition, the size of the so-called biased dataset will be always smaller • Dataset size does have an impact on the F- measure http://mat.caminos.upm.es/~iht 30 / 34
  • 32. 4. The road ahead http://mat.caminos.upm.es/~iht 31 / 34
  • 33. My workplan at UC Davis • Discuss the ideas shown here • Is bias really a problem for defect prediction? • Extend the study to more cases • Do you have a dataset of files, bugs, commits, metrics? Please let me know! • Improve the study • What happens if we break down the data in more coherent subgroups • Do the results change at different levels of granularity? http://mat.caminos.upm.es/~iht 32 / 34
  • 34. 5. Take away and conclusions http://mat.caminos.upm.es/~iht 33 / 34
  • 35. No observable Systematic difference difference in the in bug-fixes collected statistical properties of by heuristics the so-called biased dataset Ecological inference Impact on prediction What happens at accuracy not clear other scales? With other subgroups? http://mat.caminos.upm.es/~iht 34 / 34