Numerous empirical software engineering studies rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recent systematic literature review, few researchers have made their SZZ implementations publicly available. Consequently, there is a risk that research effort is wasted as new projects based on SZZ output need to initially reimplement the approach. Furthermore, there is a risk that newly developed (closed source) SZZ implementations have not been properly tested, thus conducting research based on their output might introduce threats to validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm for git repositories. This paper describes our implementation along with a usage example for the Jenkins project, and conclude with an illustrative study on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on GitHub, and warmly invite the community to contribute.
SZZ Unleashed: An Open Implementation of the SZZ Algorithm
1. MaLTeSQuE 2019
Aug 27, 2019
Markus Borg
@mrksbrg
mrksbrg.com
RISE Research Institutes of Sweden AB
SZZ Unleashed:
An Open Implementation
of the SZZ Algorithm
- Featuring Example Usage in a Study
of Just-in-Time Bug Prediction for the
Jenkins Project
4. Feed ML with SZZ output
SZZ Unleashed is on GitHub
5. Who is Markus?
• Development engineer, ABB 2007-2010
– Process automation
– Editor and compiler development
• PhD student, Lund University 2010-2015
– Requirements engineering and testing
– Traceability, change impact analysis
• Senior researcher, RISE 2015-
6. More of Markus
• Adjunct lecturer (20%), Lund University
– Teaching software engineering
• Member of the board (10%), Swedsoft
– Influence decision makers
– Write comment letters
– Facilitate networking
8. ML is data-hungry
• ML in SE often relies on bug data
• Bug trackers contain info about
fixes
• What about when bugs were
introduced?
– We need these commits!
9. Śliwerski, Zimmermann, and Zeller (SZZ)
• A heuristic approach to find bug-introducing commits
• “Few publicly available implementations”
- Rodríguez-Pérez et al. (2018)
• Many homegrown SZZ implementations
• Wasted research effort on commodity development
9
Rodríguez-Pérez, Robles, and González-Barahona.
Reproducibility and credibility in empirical software engineering: A case study based on a systematic literature
review of the use of the SZZ algorithm.
Information and Software Technology, 99, pp.164-176, 2018.
10. Van der Linden, Lundell, and Marttiin.
Commodification of industrial software: A case for open source.
IEEE Software, 26(4), pp.77-83, 2009.
19. Commit Features
Lines of code added / Total lines of code
Code churn as defined by Nagappan
and Ball et al. (2005)
Lines of code deleted / Total lines of code
Files churned / Number of files
Lines of code in previous version
Used by Kamei et al. in “A Large-scale
Empirical Study of Just-in-Time
Quality Assurance” IEEE Transactions
on Software Engineering, 39(6),
2013.
Number of modified subsystems
Number of modified sub-directories
Entropy (spreading of changes)
Purpose of a change (e.g., bug fix)
Number of previous committers
Time between committer’s contributions
Number of unique changes
Overall experience of committer
Recent experience of committer
Number of highly coupled files Coupling measures
proposed by D’Ambros et al. (2009)Number of coupled files for all degrees
Number of non-modified coupled files
21. Goal: Just-in-time bug prediction
• Axis interested in commit-level bug prediction
– Highlight commits that need more review
• Proof-of-concept for Jenkins
– Axis is a frequent contributor
– Jenkins is open source
21
22. Method
• Jenkins Dataset (~12 years 2006-2018)
– 26,378 commits (3.6% bug-introducing)
• Trained random forest classifier on 16 commit features
RQ1: Effects of oversampling and undersampling?
RQ2: Difference between cross-validation and a time-sensitive
evaluation?
22
23. Relative Importance of the Features
Lines of code added / Total lines of code 0.17
ChurnLines of code deleted / Total lines of code 0.04
Files churned / Number of files 0.08
Lines of code in previous version 0.07
Other features
Number of modified subsystems 0.11
Number of modified sub-directories 0.09
Entropy (spreading of changes) 0.16
Purpose of a change (e.g., bug fix) 0.03
Number of previous committers 0.08
Time between committer’s contributions 0.04
Number of unique changes 0.04
Overall experience of committer 0.04
Recent experience of committer 0.03
Number of highly coupled files 0.00
CouplingNumber of coupled files for all degrees 0.01
Number of non-modified coupled files 0.01
1. Churn
2. Size
3. #Committers
26. Answering the RQs…
RQ1: Effects of oversampling and undersampling?
• Baseline sampling too conservative (<3% recall)
• Oversampling is essential
RQ2: Difference between cross-validation and a time-sensitive
evaluation?
• Disregarding time gives overly positive recall (twice as high)
• Go beyond cross-validation
26
But 10-15%
F-score is low…
27. Current focus: SZZ for Faster Automatic Program Repair
27
Commits
Regression fault
Binary search
Commits
Regression faultML for risk profiling of commits
Complement training data with bug-introducing commits from SZZ
28. Feed ML with SZZ output
SZZ Unleashed is on GitHub
markus.borg@ri.se
@mrksbrg
mrksbrg.com