SlideShare a Scribd company logo
1 of 28
Download to read offline
MaLTeSQuE 2019
Aug 27, 2019
Markus Borg
@mrksbrg
mrksbrg.com
RISE Research Institutes of Sweden AB
SZZ Unleashed:
An Open Implementation
of the SZZ Algorithm
- Featuring Example Usage in a Study
of Just-in-Time Bug Prediction for the
Jenkins Project
Daniel Hansson
Oscar Svensson
Kristian Berg
https://github.com/wogscpar/SZZUnleashed
3
Feed ML with SZZ output
SZZ Unleashed is on GitHub
Who is Markus?
• Development engineer, ABB 2007-2010
– Process automation
– Editor and compiler development
• PhD student, Lund University 2010-2015
– Requirements engineering and testing
– Traceability, change impact analysis
• Senior researcher, RISE 2015-
More of Markus
• Adjunct lecturer (20%), Lund University
– Teaching software engineering
• Member of the board (10%), Swedsoft
– Influence decision makers
– Write comment letters
– Facilitate networking
Motivation
ML is data-hungry
• ML in SE often relies on bug data
• Bug trackers contain info about
fixes
• What about when bugs were
introduced?
– We need these commits!
Śliwerski, Zimmermann, and Zeller (SZZ)
• A heuristic approach to find bug-introducing commits
• “Few publicly available implementations”
- Rodríguez-Pérez et al. (2018)
• Many homegrown SZZ implementations
• Wasted research effort on commodity development
9
Rodríguez-Pérez, Robles, and González-Barahona.
Reproducibility and credibility in empirical software engineering: A case study based on a systematic literature
review of the use of the SZZ algorithm.
Information and Software Technology, 99, pp.164-176, 2018.
Van der Linden, Lundell, and Marttiin.
Commodification of industrial software: A case for open source.
IEEE Software, 26(4), pp.77-83, 2009.
SZZ 101
SZZ in a nutshell
Use closed bug reports to
find bug-fixing commits
Phase 2
Bug-fixing
commits
(A)
git blame
(B)
Bug-introducing
commit candidates
(C)
SZZ in a nutshell
Find all commits that
changed the buggy
lines of code
Bug-fixing
commits
(A)
git blame
(B)
Bug-introducing
commit candidates
(C)
SZZ in a nutshell
too recent?
partial fix?
buggy fix?
Bug-introducing
commits
SZZ Unleashed
https://github.com/wogscpar/SZZUnleashed
target project
Output JSON
[["a79fdaa4b34b8f7fddb39bed3eabf4763940d11b",
"26ec7bdf936dfbc3f496b1165cea36488a3a06b2"],
["a79fdaa4b34b8f7fddb39bed3eabf4763940d11b",
"05b46659e451c316fb5f1a5243c49b9a84a50702"],
…
["a79fdaa4b34b8f7fddb39bed3eabf4763940d11b",
"4e7a43c5863b5e7ad637a5034f75d3c144c45129"],
["a79fdaa4b34b8f7fddb39bed3eabf4763940d11b",
"b89baa56bf06b2a0f6b67a3e521236e476fe5a9d"],
["a79fdaa4b34b8f7fddb39bed3eabf4763940d11b",
"05b46659e451c316fb5f1a5243c49b9a84a50702"]]
Commit Features
Lines of code added / Total lines of code
Code churn as defined by Nagappan
and Ball et al. (2005)
Lines of code deleted / Total lines of code
Files churned / Number of files
Lines of code in previous version
Used by Kamei et al. in “A Large-scale
Empirical Study of Just-in-Time
Quality Assurance” IEEE Transactions
on Software Engineering, 39(6),
2013.
Number of modified subsystems
Number of modified sub-directories
Entropy (spreading of changes)
Purpose of a change (e.g., bug fix)
Number of previous committers
Time between committer’s contributions
Number of unique changes
Overall experience of committer
Recent experience of committer
Number of highly coupled files Coupling measures
proposed by D’Ambros et al. (2009)Number of coupled files for all degrees
Number of non-modified coupled files
Using SZZ for ML
Goal: Just-in-time bug prediction
• Axis interested in commit-level bug prediction
– Highlight commits that need more review
• Proof-of-concept for Jenkins
– Axis is a frequent contributor
– Jenkins is open source
21
Method
• Jenkins Dataset (~12 years 2006-2018)
– 26,378 commits (3.6% bug-introducing)
• Trained random forest classifier on 16 commit features
RQ1: Effects of oversampling and undersampling?
RQ2: Difference between cross-validation and a time-sensitive
evaluation?
22
Relative Importance of the Features
Lines of code added / Total lines of code 0.17
ChurnLines of code deleted / Total lines of code 0.04
Files churned / Number of files 0.08
Lines of code in previous version 0.07
Other features
Number of modified subsystems 0.11
Number of modified sub-directories 0.09
Entropy (spreading of changes) 0.16
Purpose of a change (e.g., bug fix) 0.03
Number of previous committers 0.08
Time between committer’s contributions 0.04
Number of unique changes 0.04
Overall experience of committer 0.04
Recent experience of committer 0.03
Number of highly coupled files 0.00
CouplingNumber of coupled files for all degrees 0.01
Number of non-modified coupled files 0.01
1. Churn
2. Size
3. #Committers
Conclusion
Answering the RQs…
RQ1: Effects of oversampling and undersampling?
• Baseline sampling too conservative (<3% recall)
• Oversampling is essential
RQ2: Difference between cross-validation and a time-sensitive
evaluation?
• Disregarding time gives overly positive recall (twice as high)
• Go beyond cross-validation
26
But 10-15%
F-score is low…
Current focus: SZZ for Faster Automatic Program Repair
27
Commits
Regression fault
Binary search
Commits
Regression faultML for risk profiling of commits
Complement training data with bug-introducing commits from SZZ
Feed ML with SZZ output
SZZ Unleashed is on GitHub
markus.borg@ri.se
@mrksbrg
mrksbrg.com

More Related Content

Similar to SZZ Unleashed: An Open Implementation of the SZZ Algorithm

It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software developmentMartin Pinzger
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Martin Pinzger
 
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...University of Antwerp
 
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)Giovanni Rosa
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
 
Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?Pavneet Singh Kochhar
 
Socio-technical evolution and migration in the Ruby ecosystem
Socio-technical evolution and migration in the Ruby ecosystemSocio-technical evolution and migration in the Ruby ecosystem
Socio-technical evolution and migration in the Ruby ecosystemTom Mens
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionMartin Pinzger
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug predictionMartin Pinzger
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmDmitri Zimine
 
Model-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software RepositoriesModel-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software RepositoriesMarkus Scheidgen
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Esem2010 shihab
Esem2010 shihabEsem2010 shihab
Esem2010 shihabSAIL_QU
 
Introduction to Version Control
Introduction to Version ControlIntroduction to Version Control
Introduction to Version ControlWei-Tsung Su
 

Similar to SZZ Unleashed: An Open Implementation of the SZZ Algorithm (20)

It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
 
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
 
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
Evaluating SZZ Implementations Through a Developer-informed Oracle (ICSE 2021)
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
 
Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?
 
Socio-technical evolution and migration in the Ruby ecosystem
Socio-technical evolution and migration in the Ruby ecosystemSocio-technical evolution and migration in the Ruby ecosystem
Socio-technical evolution and migration in the Ruby ecosystem
 
poster_3.0
poster_3.0poster_3.0
poster_3.0
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
 
Model-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software RepositoriesModel-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software Repositories
 
Saner16b.ppt
Saner16b.pptSaner16b.ppt
Saner16b.ppt
 
Saner16b.ppt
Saner16b.pptSaner16b.ppt
Saner16b.ppt
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Esem2010 shihab
Esem2010 shihabEsem2010 shihab
Esem2010 shihab
 
Digital_system_design_A (1).ppt
Digital_system_design_A (1).pptDigital_system_design_A (1).ppt
Digital_system_design_A (1).ppt
 
Introduction to Version Control
Introduction to Version ControlIntroduction to Version Control
Introduction to Version Control
 

More from Markus Borg

Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...Markus Borg
 
Quality Assurance Of Generative Dialog Models in an evolving Conversationa...
Quality Assurance  Of  Generative Dialog Models in an evolving  Conversationa...Quality Assurance  Of  Generative Dialog Models in an evolving  Conversationa...
Quality Assurance Of Generative Dialog Models in an evolving Conversationa...Markus Borg
 
Digital Twins Are Not Monozygotic - Cross-Replicating ADAS Testing in Two Ind...
Digital Twins Are Not Monozygotic - Cross-Replicating ADAS Testing in Two Ind...Digital Twins Are Not Monozygotic - Cross-Replicating ADAS Testing in Two Ind...
Digital Twins Are Not Monozygotic - Cross-Replicating ADAS Testing in Two Ind...Markus Borg
 
Illuminating a Blind Spot in Digitalization - Software Development in Sweden’...
Illuminating a Blind Spot in Digitalization - Software Development in Sweden’...Illuminating a Blind Spot in Digitalization - Software Development in Sweden’...
Illuminating a Blind Spot in Digitalization - Software Development in Sweden’...Markus Borg
 
Trained, Not Coded - Still Safe?
Trained, Not Coded - Still Safe?Trained, Not Coded - Still Safe?
Trained, Not Coded - Still Safe?Markus Borg
 
Explainability First! Cousteauing the Depths of Neural Networks
Explainability First! Cousteauing the Depths of Neural NetworksExplainability First! Cousteauing the Depths of Neural Networks
Explainability First! Cousteauing the Depths of Neural NetworksMarkus Borg
 
Test Automation Research... Is That Really Needed in 2018?
Test Automation Research... Is That Really Needed in 2018?Test Automation Research... Is That Really Needed in 2018?
Test Automation Research... Is That Really Needed in 2018?Markus Borg
 
Supporting Change Impact Analysis Using a Recommendation System - An Industri...
Supporting Change Impact Analysis Using a Recommendation System - An Industri...Supporting Change Impact Analysis Using a Recommendation System - An Industri...
Supporting Change Impact Analysis Using a Recommendation System - An Industri...Markus Borg
 
Component Source Origin Decisions in Practice - A Survey of Decision Making i...
Component Source Origin Decisions in Practice - A Survey of Decision Making i...Component Source Origin Decisions in Practice - A Survey of Decision Making i...
Component Source Origin Decisions in Practice - A Survey of Decision Making i...Markus Borg
 
Enabling Visual Analytics with Unity - Exploring Regression Test Results in A...
Enabling Visual Analytics with Unity - Exploring Regression Test Results in A...Enabling Visual Analytics with Unity - Exploring Regression Test Results in A...
Enabling Visual Analytics with Unity - Exploring Regression Test Results in A...Markus Borg
 
Testing Quality Requirements of a System-of-Systems in the Public Sector - Ch...
Testing Quality Requirements of a System-of-Systems in the Public Sector - Ch...Testing Quality Requirements of a System-of-Systems in the Public Sector - Ch...
Testing Quality Requirements of a System-of-Systems in the Public Sector - Ch...Markus Borg
 
From Bugs to Decision Support - Selected Research Highlights
From Bugs to Decision Support - Selected Research HighlightsFrom Bugs to Decision Support - Selected Research Highlights
From Bugs to Decision Support - Selected Research HighlightsMarkus Borg
 
Comparing Cousins – A Harmonized Analysis of Racket Sport Set Scores using Ra...
Comparing Cousins – A Harmonized Analysis of Racket Sport Set Scores using Ra...Comparing Cousins – A Harmonized Analysis of Racket Sport Set Scores using Ra...
Comparing Cousins – A Harmonized Analysis of Racket Sport Set Scores using Ra...Markus Borg
 
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and TracingAutomation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and TracingMarkus Borg
 
Revisiting the Challenges in Aligning RE and V&V: Experiences from the Public...
Revisiting the Challenges in Aligning RE and V&V: Experiences from the Public...Revisiting the Challenges in Aligning RE and V&V: Experiences from the Public...
Revisiting the Challenges in Aligning RE and V&V: Experiences from the Public...Markus Borg
 
Enabling Traceability Reuse for Impact Analyses - Toward a Recommendation Sys...
Enabling Traceability Reuse for Impact Analyses - Toward a Recommendation Sys...Enabling Traceability Reuse for Impact Analyses - Toward a Recommendation Sys...
Enabling Traceability Reuse for Impact Analyses - Toward a Recommendation Sys...Markus Borg
 
Analyzing networks of issue reports
Analyzing networks of issue reportsAnalyzing networks of issue reports
Analyzing networks of issue reportsMarkus Borg
 
Findability through Traceability - A Realistic Application of Candidate Tr...
Findability through Traceability  - A Realistic Application of Candidate Tr...Findability through Traceability  - A Realistic Application of Candidate Tr...
Findability through Traceability - A Realistic Application of Candidate Tr...Markus Borg
 
Recommendation Systems for Issue Management
Recommendation Systems for Issue ManagementRecommendation Systems for Issue Management
Recommendation Systems for Issue ManagementMarkus Borg
 

More from Markus Borg (19)

Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
 
Quality Assurance Of Generative Dialog Models in an evolving Conversationa...
Quality Assurance  Of  Generative Dialog Models in an evolving  Conversationa...Quality Assurance  Of  Generative Dialog Models in an evolving  Conversationa...
Quality Assurance Of Generative Dialog Models in an evolving Conversationa...
 
Digital Twins Are Not Monozygotic - Cross-Replicating ADAS Testing in Two Ind...
Digital Twins Are Not Monozygotic - Cross-Replicating ADAS Testing in Two Ind...Digital Twins Are Not Monozygotic - Cross-Replicating ADAS Testing in Two Ind...
Digital Twins Are Not Monozygotic - Cross-Replicating ADAS Testing in Two Ind...
 
Illuminating a Blind Spot in Digitalization - Software Development in Sweden’...
Illuminating a Blind Spot in Digitalization - Software Development in Sweden’...Illuminating a Blind Spot in Digitalization - Software Development in Sweden’...
Illuminating a Blind Spot in Digitalization - Software Development in Sweden’...
 
Trained, Not Coded - Still Safe?
Trained, Not Coded - Still Safe?Trained, Not Coded - Still Safe?
Trained, Not Coded - Still Safe?
 
Explainability First! Cousteauing the Depths of Neural Networks
Explainability First! Cousteauing the Depths of Neural NetworksExplainability First! Cousteauing the Depths of Neural Networks
Explainability First! Cousteauing the Depths of Neural Networks
 
Test Automation Research... Is That Really Needed in 2018?
Test Automation Research... Is That Really Needed in 2018?Test Automation Research... Is That Really Needed in 2018?
Test Automation Research... Is That Really Needed in 2018?
 
Supporting Change Impact Analysis Using a Recommendation System - An Industri...
Supporting Change Impact Analysis Using a Recommendation System - An Industri...Supporting Change Impact Analysis Using a Recommendation System - An Industri...
Supporting Change Impact Analysis Using a Recommendation System - An Industri...
 
Component Source Origin Decisions in Practice - A Survey of Decision Making i...
Component Source Origin Decisions in Practice - A Survey of Decision Making i...Component Source Origin Decisions in Practice - A Survey of Decision Making i...
Component Source Origin Decisions in Practice - A Survey of Decision Making i...
 
Enabling Visual Analytics with Unity - Exploring Regression Test Results in A...
Enabling Visual Analytics with Unity - Exploring Regression Test Results in A...Enabling Visual Analytics with Unity - Exploring Regression Test Results in A...
Enabling Visual Analytics with Unity - Exploring Regression Test Results in A...
 
Testing Quality Requirements of a System-of-Systems in the Public Sector - Ch...
Testing Quality Requirements of a System-of-Systems in the Public Sector - Ch...Testing Quality Requirements of a System-of-Systems in the Public Sector - Ch...
Testing Quality Requirements of a System-of-Systems in the Public Sector - Ch...
 
From Bugs to Decision Support - Selected Research Highlights
From Bugs to Decision Support - Selected Research HighlightsFrom Bugs to Decision Support - Selected Research Highlights
From Bugs to Decision Support - Selected Research Highlights
 
Comparing Cousins – A Harmonized Analysis of Racket Sport Set Scores using Ra...
Comparing Cousins – A Harmonized Analysis of Racket Sport Set Scores using Ra...Comparing Cousins – A Harmonized Analysis of Racket Sport Set Scores using Ra...
Comparing Cousins – A Harmonized Analysis of Racket Sport Set Scores using Ra...
 
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and TracingAutomation in the Bug Flow - Machine Learning for Triaging and Tracing
Automation in the Bug Flow - Machine Learning for Triaging and Tracing
 
Revisiting the Challenges in Aligning RE and V&V: Experiences from the Public...
Revisiting the Challenges in Aligning RE and V&V: Experiences from the Public...Revisiting the Challenges in Aligning RE and V&V: Experiences from the Public...
Revisiting the Challenges in Aligning RE and V&V: Experiences from the Public...
 
Enabling Traceability Reuse for Impact Analyses - Toward a Recommendation Sys...
Enabling Traceability Reuse for Impact Analyses - Toward a Recommendation Sys...Enabling Traceability Reuse for Impact Analyses - Toward a Recommendation Sys...
Enabling Traceability Reuse for Impact Analyses - Toward a Recommendation Sys...
 
Analyzing networks of issue reports
Analyzing networks of issue reportsAnalyzing networks of issue reports
Analyzing networks of issue reports
 
Findability through Traceability - A Realistic Application of Candidate Tr...
Findability through Traceability  - A Realistic Application of Candidate Tr...Findability through Traceability  - A Realistic Application of Candidate Tr...
Findability through Traceability - A Realistic Application of Candidate Tr...
 
Recommendation Systems for Issue Management
Recommendation Systems for Issue ManagementRecommendation Systems for Issue Management
Recommendation Systems for Issue Management
 

Recently uploaded

ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 

Recently uploaded (20)

ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 

SZZ Unleashed: An Open Implementation of the SZZ Algorithm

  • 1. MaLTeSQuE 2019 Aug 27, 2019 Markus Borg @mrksbrg mrksbrg.com RISE Research Institutes of Sweden AB SZZ Unleashed: An Open Implementation of the SZZ Algorithm - Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project
  • 2. Daniel Hansson Oscar Svensson Kristian Berg https://github.com/wogscpar/SZZUnleashed
  • 3. 3
  • 4. Feed ML with SZZ output SZZ Unleashed is on GitHub
  • 5. Who is Markus? • Development engineer, ABB 2007-2010 – Process automation – Editor and compiler development • PhD student, Lund University 2010-2015 – Requirements engineering and testing – Traceability, change impact analysis • Senior researcher, RISE 2015-
  • 6. More of Markus • Adjunct lecturer (20%), Lund University – Teaching software engineering • Member of the board (10%), Swedsoft – Influence decision makers – Write comment letters – Facilitate networking
  • 8. ML is data-hungry • ML in SE often relies on bug data • Bug trackers contain info about fixes • What about when bugs were introduced? – We need these commits!
  • 9. Śliwerski, Zimmermann, and Zeller (SZZ) • A heuristic approach to find bug-introducing commits • “Few publicly available implementations” - Rodríguez-Pérez et al. (2018) • Many homegrown SZZ implementations • Wasted research effort on commodity development 9 Rodríguez-Pérez, Robles, and González-Barahona. Reproducibility and credibility in empirical software engineering: A case study based on a systematic literature review of the use of the SZZ algorithm. Information and Software Technology, 99, pp.164-176, 2018.
  • 10. Van der Linden, Lundell, and Marttiin. Commodification of industrial software: A case for open source. IEEE Software, 26(4), pp.77-83, 2009.
  • 12. SZZ in a nutshell Use closed bug reports to find bug-fixing commits Phase 2
  • 13. Bug-fixing commits (A) git blame (B) Bug-introducing commit candidates (C) SZZ in a nutshell Find all commits that changed the buggy lines of code
  • 14. Bug-fixing commits (A) git blame (B) Bug-introducing commit candidates (C) SZZ in a nutshell too recent? partial fix? buggy fix? Bug-introducing commits
  • 15.
  • 19. Commit Features Lines of code added / Total lines of code Code churn as defined by Nagappan and Ball et al. (2005) Lines of code deleted / Total lines of code Files churned / Number of files Lines of code in previous version Used by Kamei et al. in “A Large-scale Empirical Study of Just-in-Time Quality Assurance” IEEE Transactions on Software Engineering, 39(6), 2013. Number of modified subsystems Number of modified sub-directories Entropy (spreading of changes) Purpose of a change (e.g., bug fix) Number of previous committers Time between committer’s contributions Number of unique changes Overall experience of committer Recent experience of committer Number of highly coupled files Coupling measures proposed by D’Ambros et al. (2009)Number of coupled files for all degrees Number of non-modified coupled files
  • 21. Goal: Just-in-time bug prediction • Axis interested in commit-level bug prediction – Highlight commits that need more review • Proof-of-concept for Jenkins – Axis is a frequent contributor – Jenkins is open source 21
  • 22. Method • Jenkins Dataset (~12 years 2006-2018) – 26,378 commits (3.6% bug-introducing) • Trained random forest classifier on 16 commit features RQ1: Effects of oversampling and undersampling? RQ2: Difference between cross-validation and a time-sensitive evaluation? 22
  • 23. Relative Importance of the Features Lines of code added / Total lines of code 0.17 ChurnLines of code deleted / Total lines of code 0.04 Files churned / Number of files 0.08 Lines of code in previous version 0.07 Other features Number of modified subsystems 0.11 Number of modified sub-directories 0.09 Entropy (spreading of changes) 0.16 Purpose of a change (e.g., bug fix) 0.03 Number of previous committers 0.08 Time between committer’s contributions 0.04 Number of unique changes 0.04 Overall experience of committer 0.04 Recent experience of committer 0.03 Number of highly coupled files 0.00 CouplingNumber of coupled files for all degrees 0.01 Number of non-modified coupled files 0.01 1. Churn 2. Size 3. #Committers
  • 24.
  • 26. Answering the RQs… RQ1: Effects of oversampling and undersampling? • Baseline sampling too conservative (<3% recall) • Oversampling is essential RQ2: Difference between cross-validation and a time-sensitive evaluation? • Disregarding time gives overly positive recall (twice as high) • Go beyond cross-validation 26 But 10-15% F-score is low…
  • 27. Current focus: SZZ for Faster Automatic Program Repair 27 Commits Regression fault Binary search Commits Regression faultML for risk profiling of commits Complement training data with bug-introducing commits from SZZ
  • 28. Feed ML with SZZ output SZZ Unleashed is on GitHub markus.borg@ri.se @mrksbrg mrksbrg.com