10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?

Duplicate Bug Reports
Considered Harmful…
Really?
Nicolas Bettenburg • Rahul Premraj • Tom Zimmerman • Sunghun Kim 
ICSME’2018 (Madrid) • September 28th, 2018

Automated Severity Assessment of Software Defect Reports
Tim Menzies
Lane Department of Computer Science,
West Virginia University
PO Box 6109, Morgantown, WV, 26506
304 293 0405
tim@menzies.us
Andrian Marcus
Department of Computer Science
Wayne State University
Detroit, MI 48202
313 577 5408
amarcus@wayne.edu
Abstract
In mission critical systems, such as those developed
by NASA, it is very important that the test engineers
properly recognize the severity of each issue they
identify during testing. Proper severity assessment is
essential for appropriate resource allocation and
planning for fixing activities and additional testing.
Severity assessment is strongly influenced by the
experience of the test engineers and by the time they
spend on each issue.
The paper presents a new and automated method
named SEVERIS (SEVERity ISsue assessment), which
assists the test engineer in assigning severity levels to
defect reports. SEVERIS is based on standard text
mining and machine learning techniques applied to
existing sets of defect reports. A case study on using
SEVERIS with data from NASA’s Project and Issue
Tracking System (PITS) is presented in the paper. The
case study results indicate that SEVERIS is a good
predictor for issue severity levels, while it is easy to
use and efficient.
1. Introduction
NASA’s software Independent Verification and
Validation (IV&V) Program captures all of its findings
in a database called the Project and Issue Tracking
System (PITS). The data in PITS has been collected
for more than 10 years and includes issues on robotic
satellite missions and human-rated systems.
Nowadays, similar defect tracking systems, such as
Bugzilla1
, have become very popular, largely due to the
spread of open source software development. These
systems help to track bugs and changes in the code, to
submit and review patches, to manage quality
assurance, to support communication between
developers, etc.
As compared to newer systems, the problem with
PITS is that there is a lack of consistency in how each
1
http://www.bugzilla.org/
of the projects collected issue data. In most instances,
the specific configuration of the information captured
about an issue was tailored by the IV&V project to
meet its needs. This has created consistency problems
when metrics data is pulled across projects. While
there was a set of required data fields, the majorities of
those fields do not provide information in regards to
the quality of the issue and are not very suitable for
comparing projects.
A common issue among defect tracking systems is
that they are useful for storing day-to-day information
and generating small-scale tactical reports (e.g., “list
the bugs we found last Tuesday”), but difficult to use
for high-end business strategic analysis (e.g., “in the
past, what methods have proved most cost effective in
finding bugs?”). Another issue common to these
systems is that most of the data is unstructured (i.e.,
free text). Specific to PITS is that the database fields
in PITS keep changing, yet the nature of the
unstructured text remains constant. In consequence,
one logical choice in the analysis of defect reports is a
combination of text mining and machine learning.
In this paper we present a new approach for
extracting general conclusions from PITS data based
on text mining and machine learning methods, which
are low cost, automatic, and rapid. We designed and
built a tool named SEVERIS (SEVERity ISsue
assessment) to automatically review issue reports and
alert when a proposed severity is anomalous. The way
SEVRIS is built provides the probabilities that the
assessment is correct. These probabilities can be used
to guide decision making in this process. Assigning
the correct severity levels to issue reports is extremely
important in the process employed at NASA, as it
directly impacts resource allocation and planning of
subsequent defect fixing activities.
NASA uses a five-point scale to score issue
severity. The scale ranges one to five, worst to dullest,
respectively. A different scale is used for robotic and
human-rated missions (see Table 1).

Tim Menzies
304 293 0405
tim@menzies.us
Andrian Marcus
Detroit, MI 48202
313 577 5408
amarcus@wayne.edu
Abstract
use and efficient.
1. Introduction
Bugzilla1
developers, etc.
1
comparing projects.
Predicting which bugs
get fixed. 
Guo et al.

Tim Menzies
304 293 0405
tim@menzies.us
Andrian Marcus
Detroit, MI 48202
313 577 5408
amarcus@wayne.edu
Abstract
use and efficient.
1. Introduction
Bugzilla1
developers, etc.
1
comparing projects.
get fixed. 
Guo et al.
Predicting Severity of
a reported bug. 
Lamkanfi et al.

Tim Menzies
304 293 0405
tim@menzies.us
Andrian Marcus
Detroit, MI 48202
313 577 5408
amarcus@wayne.edu
Abstract
use and efficient.
1. Introduction
Bugzilla1
developers, etc.
1
comparing projects.
get fixed. 
Guo et al.
a reported bug. 
Lamkanfi et al.
Characterizing re-
opened bugs. 
Zimmermann et al.

Tim Menzies
304 293 0405
tim@menzies.us
Andrian Marcus
Detroit, MI 48202
313 577 5408
amarcus@wayne.edu
Abstract
use and efficient.
1. Introduction
Bugzilla1
developers, etc.
1
comparing projects.
get fixed. 
Guo et al.
a reported bug. 
Lamkanfi et al.
Characterizing re-
opened bugs. 
Zimmermann et al.
What makes a good
bug report.
Bettenburg et al.

Tim Menzies
304 293 0405
tim@menzies.us
Andrian Marcus
Detroit, MI 48202
313 577 5408
amarcus@wayne.edu
Abstract
use and efficient.
1. Introduction
Bugzilla1
developers, etc.
1
comparing projects.
get fixed. 
Guo et al.
a reported bug. 
Lamkanfi et al.
Characterizing re-
opened bugs. 
Zimmermann et al.
What makes a good
bug report.
Bettenburg et al.
Do clones matter?
Juergens et al.

Tim Menzies
304 293 0405
tim@menzies.us
Andrian Marcus
Detroit, MI 48202
313 577 5408
amarcus@wayne.edu
Abstract
use and efficient.
1. Introduction
Bugzilla1
developers, etc.
1
comparing projects.
get fixed. 
Guo et al.
a reported bug. 
Lamkanfi et al.
Characterizing re-
opened bugs. 
Zimmermann et al.
What makes a good
bug report.
Bettenburg et al.
Do clones matter?
Juergens et al.
Frequency and Risks
of changes to clones.
Göde et al.

Tim Menzies
304 293 0405
tim@menzies.us
Andrian Marcus
Detroit, MI 48202
313 577 5408
amarcus@wayne.edu
Abstract
use and efficient.
1. Introduction
Bugzilla1
developers, etc.
1
comparing projects.
get fixed. 
Guo et al.
a reported bug. 
Lamkanfi et al.
Characterizing re-
opened bugs. 
Zimmermann et al.
What makes a good
bug report.
Bettenburg et al.
Do clones matter?
Juergens et al.
Frequency and Risks
Göde et al.
Do developers care
about code smells?
Yamashita et al.

Tim Menzies
304 293 0405
tim@menzies.us
Andrian Marcus
Detroit, MI 48202
313 577 5408
amarcus@wayne.edu
Abstract
use and efficient.
1. Introduction
Bugzilla1
developers, etc.
1
comparing projects.
get fixed. 
Guo et al.
a reported bug. 
Lamkanfi et al.
Characterizing re-
opened bugs. 
Zimmermann et al.
What makes a good
bug report.
Bettenburg et al.
Do clones matter?
Juergens et al.
Frequency and Risks
Göde et al.
Do developers care
about code smells?
Yamashita et al.
Inconsistent Changes
to Clones at Release
Level. Bettenburg et
al.

March OctoberJanuary June November
2007

There are many, varied stories behind
the observed SE artifacts.

Ignoring available data could lead to
missing fundamentally important
insights

When the same bug is reported several times
in Bugzilla, developers are slowed down
https://fedoraproject.org/wiki/How_to_file_a_bug_report#Avoiding_Duplicate_Bug_Reports

A duplicate bug is a burden in the testing cycle.
https://www.softwaretestinghelp.com/how-to-write-good-bug-report/

Several duplicate bug reports just cause an
administration headache for developers
http://wicket.apache.org/help/reportabug.html

Duplicate bug reports, […] consume time of bug triagers
and software developers that might better be spent
working on reports that describe unique requests.
Lyndon Hiew , MSc. Thesis, 2006, UBC
Several duplicate bug reports just cause an
administration headache for developers
http://wicket.apache.org/help/reportabug.html

DON’T BE
THAT GUY
who submitted a
DUPLICATE

It doesn't even mean that that the resolved
bug report can now be ignored, since we
have seen instances of late- identiﬁcation
of duplicates (e.g., BR-C in Figure 2) in
which accumulated knowledge and
dialogue may still be relevant to the
resolution of the other bug reports in the
BRN.
Robert J. Sandusky, Les Gasser, and Gabriel Ripoche. Bug report networks: Varieties, strategies, and impacts in an oss
development community. In Proc. of ICSE Workshop on Mining Software Repositories, 2004.

“Duplicates are not really problems.
They often add useful information.
That this information were ﬁled under
a new report is not ideal though.”
N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann. What makes a good bug report? In
Proceedings of the 16th International Symposium on Foundations of Software Engineering, November 2008.

can
gly
m-
ese
to
in-
item h hm P(hm | h)
steps to reproduce 47 42 0.8936
stack traces 45 35 0.7778
screenshots 42 17 0.4048
test cases 39 11 0.2821
observed behavior 44 12 0.2727
code examples 38 9 0.2368
error reports 33 3 0.0909
build information 34 3 0.0882
summary 36 3 0.0833
expected behavior 41 3 0.0732
version 38 1 0.0236
component 34 0 0.0000
hardware 13 0 0.0000
operating system 34 0 0.0000
product 30 0 0.0000
severity 26 0 0.0000
Table 1. Lists all items from the first survey part with
the count how often they helped (h), how often they helped
the most (hm), and the probability that an item helped most
under the condition that it helped.
5. Metric
Now that we got an idea about important information contained
in a bug report and have a sample of reports ranked by experts we
item a am P(am|a)
errors in steps to reproduce 34 29 0.8235
incomplete information 44 35 0.7727
wrong observed behavior 15 11 0.6667
wrong version number 21 8 0.2857
errors in test cases 14 4 0.2857
unstructured text 19 7 0.2632
wrong operating system 8 3 0.2500
wrong expected behavior 18 7 0.2222
non-technical language 14 3 0.2143
too long text 11 2 0.1818
errors in code examples 11 2 0.1818
bad grammar 29 5 0.1724
wrong component name 22 2 0.0909
prose text 12 2 0.0833
duplicates 31 2 0.0645
no spellcheck 8 0 0.0000
wrong hardware 5 0 0.0000
spam 1 0 0.0000
wrong product name 11 0 0.0000
errors in strack traces 2 0 0.0000
Table 2. Lists all items from the second part with the
count how often they harmed (a), how often they harmed
the most (am), and the probability that an item harmed most
under the condition that it harmed.
was filled out by 48 out of 365 developers in total. Secondly we
present the results of our metric which we compare to the expert
opinions we gained from the ugly reports study.
T
r
m
repo
tom
acce
W
vide
have
0.3.
7. D
S
tion.
the r
we c
item a am P(am|a)
errors in steps to reproduce 34 29 0.8235
incomplete information 44 35 0.7727
wrong observed behavior 15 11 0.6667
wrong version number 21 8 0.2857
errors in test cases 14 4 0.2857
unstructured text 19 7 0.2632
wrong operating system 8 3 0.2500
wrong expected behavior 18 7 0.2222
non-technical language 14 3 0.2143
too long text 11 2 0.1818
errors in code examples 11 2 0.1818
bad grammar 29 5 0.1724
wrong component name 22 2 0.0909
prose text 12 2 0.0833
duplicates 31 2 0.0645
no spellcheck 8 0 0.0000
wrong hardware 5 0 0.0000
spam 1 0 0.0000
wrong product name 11 0 0.0000
errors in strack traces 2 0 0.0000
Table 2. Lists all items from the second part with the
count how often they harmed (a), how often they harmed
the most (am), and the probability that an item harmed most
under the condition that it harmed.
was filled out by 48 out of 365 developers in total. Secondly we
rep
tom
acc
vid
hav
0.3
7.
What Helps the Most? What Harms the Most?
N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann. What makes a good bug report? In Proceedings of the 16th International
Symposium on Foundations of Software Engineering, November 2008.

PART 1
Is there extra information
in duplicate reports and
if so, can we quantify
how much?
PART 2
Is that extra information

helpful for carrying out
software engineering
tasks?

●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
● ●
●
● ●
●
Reports/Month
●
● ●
● ●
●
●
●
●
● ● ● ●
● ● ●
●
●
● ● ● ● ● ● ● ● ● ●
●
● ●
● ●
●
●
●
● ● ● ● ● ● ●
● ●
● ● ● ●
●
● ●
●
●
● ●
●
● ● ● ● ● ● ● ●
● ●
●
● ● ● ● ● ●
●
010002000300040005000
1.0 2.0 2.1 2.1.1 2.1.2 2.1.3 3.0 3.0.1 3.0.2 3.1 3.1.1 3.1.2 3.2 3.2.2 3.3 Milestones
Oct'01 Jan'02 Apr'02 Jul'02 Oct'02 Jan'03 Apr'03 Jul'03 Oct'03 Jan'04 Apr'04 Jul'04 Oct'04 Jan'05 Apr'05 Jul'05 Oct'05 Jan'06 Apr'06 Jul'06 Oct'06 Jan'07 Apr'07 Jul'07 Oct'07
● Reports submitted (total) ● Duplicates submitted
~ 3,000 reports submitted per month

~ 13% duplicate bug reports
First, we need DATA … lot’s of DATA!

100,000
200,000
300,000
400,000
500,000
Mozilla
Bug Reports without duplicates Duplicate Reports Master Reports
269,222
116,727
36,697
50,000
100,000
150,000
200,000
250,000
Eclipse
167,494
27,838
16,511
Figure 4.1: Graphical representation of the collected bug report data.
The MOZILLA database was mined using a tool that reads the XML repre-
Inverse Duplicate Problem
27% (Mozilla)

31% (Eclipse)

Bug 137808
Summary: Exceptions from createFromString lock-up the editor
Product: [Modeling] EMF Reporter: Patrick Sodre <psodre@gmail.com>
Component: Core Assignee: Marcelo Paternostro <marcelop@ca.ibm.com>
Status: VERIFIED FIXED QA Contact:
Severity: normal
Priority: P3 CC: merks@ca.ibm.com
Version: 2.2
Target Milestone: ---
Hardware: PC
OS: Windows XP
Whiteboard:
Description:
Opened: 2006-04-20 14:25 -
0400
As discussed on the newsgroup under the Thread with the same name I am opening
this bug entry. Here is a history of the thread.
-- From Ed Merks
Patrick,
The value is checked before it's applied and can't be applied until it's valid.
But this BigDecimal cases behaves oddly because the exception thrown by
new BigDecimal("badvalue")
has a null message and the property editor relies on returning a non-null
message string to indicate there is an error.
Please open a bugzilla which I'll fix like this:
### Eclipse Workspace Patch 1.0
#P org.eclipse.emf.edit.ui
Index: src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java
===================================================================
RCS file:
/cvsroot/tools/org.eclipse.emf/plugins/org.eclipse.emf.edit.ui/src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java,v
retrieving revision 1.10
diff -u -r1.10 PropertyDescriptor.java
--- src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 21 Mar 2006
16:42:30 -0000 1.10
+++ src/org/eclipse/emf/edit/ui/provider/PropertyDescriptor.java 20 Apr 2006
11:59:10 -0000
@@ -162,7 +162,8 @@
}
catch (Exception exception)
{
- return exception.getMessage();
+ String message = exception.getMessage();
+ return message == null ? exception.toString() : message;
}
}
Diagnostic diagnostic =
Diagnostician.INSTANCE.validate(EDataTypeCellEditor.this.eDataType, value);
Patrick Sodre wrote:
Hi,
It seems that if the user inputs an invalid parameter that gets created from
"createFromString" the Editor locks-up until the user explicitly calls "restore
Default Value".
Is this the expected behavior or could something better be done? For
instance if an exception is thrown restore the value back to what it was before
after displaying a pop-up error message.
I understand that for DataTypes defined by the user he/she should take care
of catching the exceptions but for the default ones like BigInteger/BigDecimal
I think the EMF runtime could do some of the grunt work...
If you think this is something worth pursuing I could post an entry in
Bugzilla.
Regards,
Patrick Sodre
Below is the stack trace that I got from the Editor...
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:368)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createEBigDecimalFromString(EcoreFactoryImpl.java:559)
at
org.eclipse.emf.ecore.impl.EcoreFactoryImpl.createFromString(EcoreFactoryImpl.java:116)
at
org.eclipse.emf.edit.ui.provider.PropertyDescriptor$EDataTypeCellEditor.doGetValue(PropertyDescriptor.java:183)
at org.eclipse.jface.viewers.CellEditor.getValue(CellEditor.java:449)
at
org.eclipse.ui.views.properties.PropertySheetEntry.applyEditorValue(PropertySheetEntry.java:135)
at
org.eclipse.ui.views.properties.PropertySheetViewer.applyEditorValue(PropertySheetViewer.java:249)
at
------- Comment #1 From Ed Merks 2006-04-20 15:09:23 -0400 -------
The fix has been committed to CVS. Thanks for reporting this problem.
------- Comment #2 From Marcelo Paternostro 2006-04-27 10:44:24 -0400 -------
Fixed in the I200604270000 built
------- Comment #3 From Nick Boldt 2008-01-28 16:46:51 -0400 -------
Move to verified as per bug 206558.
Extracting Structural Information from Bug Reports (MSR 2008)

Bug 137808
Severity: normal
Version: 2.2
Hardware: PC
OS: Windows XP
Whiteboard:
Description:
Opened: 2006-04-20 14:25 -
0400
-- From Ed Merks
Patrick,
===================================================================
RCS file:
16:42:30 -0000 1.10
11:59:10 -0000
@@ -162,7 +162,8 @@
}
{
}
}
Hi,
Default Value".
Bugzilla.
Regards,
Patrick Sodre
at
at
at
at
at
at
METADATA

Bug 137808
Severity: normal
Version: 2.2
Hardware: PC
OS: Windows XP
Whiteboard:
Description:
Opened: 2006-04-20 14:25 -
0400
-- From Ed Merks
Patrick,
===================================================================
RCS file:
16:42:30 -0000 1.10
11:59:10 -0000
@@ -162,7 +162,8 @@
}
{
}
}
Hi,
Default Value".
Bugzilla.
Regards,
Patrick Sodre
at
at
at
at
at
at
SOURCE CODE
METADATA

Bug 137808
Severity: normal
Version: 2.2
Hardware: PC
OS: Windows XP
Whiteboard:
Description:
Opened: 2006-04-20 14:25 -
0400
-- From Ed Merks
Patrick,
===================================================================
RCS file:
16:42:30 -0000 1.10
11:59:10 -0000
@@ -162,7 +162,8 @@
}
{
}
}
Hi,
Default Value".
Bugzilla.
Regards,
Patrick Sodre
at
at
at
at
at
at
SOURCE CODE
PATCHES
METADATA

Bug 137808
Severity: normal
Version: 2.2
Hardware: PC
OS: Windows XP
Whiteboard:
Description:
Opened: 2006-04-20 14:25 -
0400
-- From Ed Merks
Patrick,
===================================================================
RCS file:
16:42:30 -0000 1.10
11:59:10 -0000
@@ -162,7 +162,8 @@
}
{
}
}
Hi,
Default Value".
Bugzilla.
Regards,
Patrick Sodre
at
at
at
at
at
at
SCREENSHOTS
SOURCE CODE
PATCHES
METADATA

Bug 137808
Severity: normal
Version: 2.2
Hardware: PC
OS: Windows XP
Whiteboard:
Description:
Opened: 2006-04-20 14:25 -
0400
-- From Ed Merks
Patrick,
===================================================================
RCS file:
16:42:30 -0000 1.10
11:59:10 -0000
@@ -162,7 +162,8 @@
}
{
}
}
Hi,
Default Value".
Bugzilla.
Regards,
Patrick Sodre
at
at
at
at
at
at
SCREENSHOTS
SOURCE CODE
PATCHES
STACK TRACES
METADATA

3.6 Order of Extraction
PATCHES STACK TRACES SOURCE CODE ENUMERATIONS
loremm ipsum dolor met e4a
this is a public String {
dosomeThing();
}
We have the following
problem:
- first you have to do
- then you must do
We propos the following
patch file to be used:
Index: someFile.java
=====================
INPUT Index: PatchFilter.java
==================
RCS File: PatchFilter.java
--- PatchFilter.java
23.10.2007
+++ PatchFilter.java
24.10.2007
@@+7,13-7,14@@
This is a sample context
line
- This line will be removed
+ this line will be added
instead
PATCH
Index: PatchFilter.java
==================
23.10.2007
24.10.2007
@@+7,13-7,14@@
line
instead
TRACE
==================
23.10.2007
24.10.2007
@@+7,13-7,14@@
line
instead
CODE
dosomeThing();
}
problem:
- then you must do
=====================
OUTPUT
Figure 3.10: We extract structural elements in a ﬁxed sequence.
The order in which the detection and extraction of elements is executed, is
of great importance. Several structural elements interfere:
• Patches vs. Enumerations
Enumerations, especially itemization interfere with the hunk lines in
patches. Both use the symbols “+” and “-”.

3.6 Order of Extraction
PATCHES STACK TRACES SOURCE CODE ENUMERATIONS
dosomeThing();
}
problem:
- then you must do
=====================
INPUT Index: PatchFilter.java
==================
23.10.2007
24.10.2007
@@+7,13-7,14@@
line
instead
PATCH
==================
23.10.2007
24.10.2007
@@+7,13-7,14@@
line
instead
TRACE
==================
23.10.2007
24.10.2007
@@+7,13-7,14@@
line
instead
CODE
dosomeThing();
}
problem:
- then you must do
=====================
OUTPUT
Figure 3.10: We extract structural elements in a fixed sequence.
The order in which the detection and extraction of elements is executed, is
of great importance. Several structural elements interfere:
• Patches vs. Enumerations
Enumerations, especially itemization interfere with the hunk lines in
patches. Both use the symbols “+” and “-”.
reports. The evaluation is split into two parts: first, we want to focus on
the correct identification of the presence of enumerations, patches, stack
traces and source code in bug reports. Knowing the the reliability of our
approach, we can then proceed in identifying how good the detected elements
are extracted by our methods.
Evaluation Setup
We parsed 161,500 bug reports from the ECLIPSE project which were submit-
ted between October 2001 and December 2007. For each report, INFOZILLA
verified the presence of each of the four structural element types. For each
element, it classified the report into one of two bins: B1 (report has Element)
and B2 (report does not have Element).
dosomeThing();
}
problem:
- then you must do
=====================
INPUT
Has
Element?
No
Yes B1
B2
Figure 3.11: For each element we classified the report into two bins.

Master Report
BUGthisasd
asdlknasdklnasdlk
askdnaklsdn
aksdnlaksdnlkasdkn
asd
sadddda
asdaddasd
aksdnlaskdnlkansd
Elements
Extended Report
BUGthisasd
asdlknasdklnasdlk
askdnaklsdn
aksdnlaksdnlkasdkn
asd
sadddda
asdaddasd
aksdnlaskdnlkansd BUGthisasd
asdlknasdklnasdlk
askdnaklsdn
aksdnlkasdkn
asdasdasdasdasd
a
s
adddda
a
daddasd
asdasdasdasdasd
askdnlkansd
Elements
compare

5.2 Results 35
Average per master report
Information item Master Extended Change⇤
Predefined fields
– product 1.000 1.127 +0.127
– component 1.000 1.287 +0.287
– operating system 1.000 1.631 +0.631
– reported platform 1.000 1.241 +0.241
– version 0.927 1.413 +0.486
– reporter 1.000 2.412 +1.412
– priority 1.000 1.291 +0.291
– target milestone 0.654 0.794 +0.140
Patches
– total 1.828 1.942 +0.113
– unique: patched files 1.061 1.124 +0.062
Screenshots
– total 0.139 0.285 +0.145
– unique: filename, filesize 0.138 0.281 +0.143
Stacktraces
– total 0.504 1.422 +0.918
– unique: exception 0.195 0.314 +0.118
– unique: exception, top frame 0.223 0.431 +0.207
– unique: exception, top 2 frames 0.229 0.458 +0.229
⇤
For all information items the increase is significant at p < .001.
Table 5.1: Average amount of information added by duplicates.
A reporter’s reputation can go a long way in influencing the future course of a
36 5. Additional Information in Duplicate Reports
Average per master report
Information item Master Extended Change⇤
Predefined fields
– product 1.000 1.400 +0.400
– component 1.000 1.953 +0.953
– operating system 1.000 2.102 +1.102
– reported platform 1.000 1.544 +0.544
– version 0.814 0.979 +0.165
– reporter 1.000 3.705 +2.705
– priority 0.377 0.499 +0.122
– target milestone 0.433 0.558 +0.125
Patches
– total 5.038 5.184 +0.146
– unique: patched files 2.003 2.067 +0.064
Screenshots
– total 0.200 0.391 +0.191
– unique: filename, filesize 0.197 0.385 +0.187
Stacktraces
– total 0.100 0.185 +0.085
– unique: exception 0.033 0.047 +0.014
– unique: exception, top frame 0.069 0.130 +0.061
⇤
For all information items the increase is significant at p < .001.
Table 5.2: Average amount of information added by duplicates.
We compared stack traces considering the exception that was thrown and
ECLIPSE MOZILLA
ADDITIONAL INFORMATION

Duplicate bug reports can provide useful additional information.

For example, we can ﬁnd up to three times the stack traces
which are helpful in ﬁxing bugs

There is significant evidence of
additional information in duplicate
bug reports that is uniquely different
from the information already reported.

DeveloperReport
BUG
The Triage Problem

DeveloperReport
BUG
Fixed
BUG
✓
The Triage Problem

BUG
DeveloperReport
BUG
Fixed
BUG
✓
BUG
BUG
BUG
BUG
BUG
BUG
The Triage Problem

BUG
DeveloperReport
BUG
Fixed
BUG
✓
BUG
BUG
BUG
BUG
BUG
BUG Triager
The Triage Problem

A1
A2
An
...
MASTER
Class 3
A1
A2
An
...
DUPLICATE n
Class 2
A1
A2
An
...
DUPLICATE 1
Class 3
A1
A2
An
...
DUPLICATE n
Class 3
...
A1
A2
An
...
MASTER
Class 2
A1
A2
An
...
MASTER
Class 3
...
A1
A2
An
...
DUPLICATE n
Class 1
A1
A2
An
...
DUPLICATE 1
Class 2
A1
A2
An
...
DUPLICATE n
Class 2
A1
A2
An
...
DUPLICATE 1
Class 3
A1
A2
An
...
DUPLICATE n
Class 3
...
A1
A2
An
...
MASTER
Class 2
A1
A2
An
...
MASTER
Class 1
A1
A2
An
...
MASTER
Class 3
...
A1
A2
An
...
DUPLICATE 1
Class 1
A1
A2
An
...
DUPLICATE n
Class 1
...
A1
A2
An
...
DUPLICATE 1
Class 2
A1
A2
An
...
DUPLICATE n
Class 2
A1
A2
An
...
DUPLICATE 1
Class 3
A1
A2
An
...
DUPLICATE n
Class 3
...
“Whoever was assigned to the Master
should have been assigned to any of the
Duplicates.”
“Only the person who was originally
assigned to a report can ﬁx it.”
“Any person assigned to any of the reports in
the duplicate group can provide a ﬁx.”

Master reports, sorted chronologically
Training
Training
Training Testing
Fold 1 Fold 2 Fold 3 Fold 11
Testing
Testing
. . . . . . . . . . . . . . . . . .
.......
Split into
Run 1
Run 2
Run 10

46 6. Additional Information can Help Developers
Table 6.1: Percentages of reports correctly triaged to ECLIPSE developers.
Run
Model Result Training 1 2 3 4 5 6 7 8 9 10 All
SVM
Top 1
Master 15.45 19.28 19.03 19.80 25.80 26.44 22.09 27.08 27.71 29.12 23.18
Extended 18.39⇤
20.95 22.22⇤
21.46 27.84 28.48 23.37 30.52⇤
30.78⇤
30.52 25.45⇤
Top 3
Master 32.44 37.42 40.87 39.72 46.10 46.36 38.95 44.70 48.53 47.25 42.23
Extended 38.70⇤
42.78⇤
43.30 39.34 50.83⇤
49.55⇤
42.40⇤
50.32⇤
50.32 55.04⇤
46.25⇤
Top 5
Master 41.89 46.87 47.38 47.64 54.66 56.96 47.51 52.36 56.58 56.45 50.83
Extended 47.38⇤
52.11⇤
53.00⇤
51.85⇤
60.54⇤
59.90⇤
51.09⇤
58.11⇤
60.28⇤
65.26⇤
55.95⇤
Bayes
Top 1
Master 14.81 16.60 17.75 17.75 22.73 21.20 20.56 23.50 27.71 28.22 21.08
Extended 15.45 17.11 20.56⇤
18.01 19.80⇤
19.80 22.99 27.08⇤
26.82 30.40⇤
21.80
Top 3
Master 29.12 32.31 35.12 34.99 40.36 38.06 35.76 43.55 45.59 46.87 38.17
Extended 36.53⇤
33.08 38.83⇤
35.50 39.08 39.08 39.97⇤
46.23 45.85 50.45⇤
40.46⇤
Top 5
Master 38.44 42.40 45.72 45.21 50.70 47.64 44.06 51.85 54.92 55.17 47.61
Extended 45.72⇤
44.70 48.02 43.55 48.91 50.45⇤
49.43⇤
55.30⇤
54.28 58.49⇤
49.88⇤
⇤ Increase in accuracy is significant at p = .05
Table 6.2: Percentages of reports correctly triaged to MOZILLA developers.
Run
Top 1
Master 14.57 14.30 14.16 18.29 18.83 19.17 21.00 19.65 19.99 22.15 18.21
Extended 15.31 14.43 17.95 19.44 19.78 19.51 21.82 23.10 18.29 19.31 18.90
onal Information can Help Developers
rectly triaged to ECLIPSE developers.
Run
4 5 6 7 8 9 10 All
19.80 25.80 26.44 22.09 27.08 27.71 29.12 23.18
21.46 27.84 28.48 23.37 30.52⇤
30.78⇤
30.52 25.45⇤
39.72 46.10 46.36 38.95 44.70 48.53 47.25 42.23
39.34 50.83⇤
49.55⇤
42.40⇤
50.32⇤
50.32 55.04⇤
46.25⇤
47.64 54.66 56.96 47.51 52.36 56.58 56.45 50.83
51.85⇤
60.54⇤
59.90⇤
51.09⇤
58.11⇤
60.28⇤
65.26⇤
55.95⇤
17.75 22.73 21.20 20.56 23.50 27.71 28.22 21.08
18.01 19.80⇤
19.80 22.99 27.08⇤
26.82 30.40⇤
21.80
34.99 40.36 38.06 35.76 43.55 45.59 46.87 38.17
35.50 39.08 39.08 39.97⇤
46.23 45.85 50.45⇤
40.46⇤
45.21 50.70 47.64 44.06 51.85 54.92 55.17 47.61
43.55 48.91 50.45⇤
49.43⇤
55.30⇤
54.28 58.49⇤
49.88⇤
rectly triaged to MOZILLA developers.
Run
4 5 6 7 8 9 10 All
18.29 18.83 19.17 21.00 19.65 19.99 22.15 18.21
19.44 19.78 19.51 21.82 23.10 18.29 19.31 18.90
Bayes Top 3
Master 29.12 32.31 35.12 34.99 40.36 38.06 35.76 43.55 45.59 46.87 38.17
Extended 36.53⇤
33.08 38.83⇤
35.50 39.08 39.08 39.97⇤
46.23 45.85 50.45⇤
40.46⇤
Top 5
Master 38.44 42.40 45.72 45.21 50.70 47.64 44.06 51.85 54.92 55.17 47.61
Extended 45.72⇤
44.70 48.02 43.55 48.91 50.45⇤
49.43⇤
55.30⇤
54.28 58.49⇤
49.88⇤
Table 6.2: Percentages of reports correctly triaged to MOZILLA developers.
Run
SVM
Top 1
Master 14.57 14.30 14.16 18.29 18.83 19.17 21.00 19.65 19.99 22.15 18.21
Extended 15.31 14.43 17.95 19.44 19.78 19.51 21.82 23.10 18.29 19.31 18.90
Top 3
Master 28.59 28.46 31.84 37.53 36.52 39.30 41.26 44.58 42.82 43.09 37.40
Extended 32.38 30.15 36.86 39.70 37.26 40.72 43.29 47.83 42.48 39.36 39.00
Top 5
Master 37.13 36.04 41.67 46.41 44.99 48.92 50.75 56.03 53.52 51.22 46.67
Extended 42.48 39.77 46.07 49.80 49.05 54.27 53.32 60.57 54.74 49.66 49.98
Bayes
Top 1
Master 15.11 12.60 16.94 17.62 17.01 19.44 18.22 25.81 25.47 27.98 19.62
Extended 15.24 13.75 18.50 20.39 19.78 23.51 23.31 26.22 24.46 25.88 21.10
Top 3
Master 27.71 29.67 34.42 37.94 35.70 40.18 40.04 44.58 45.33 43.90 37.94
Extended 32.11 29.40 36.72 39.50 39.70 44.24 44.24 48.85 45.87 44.17 40.48
Top 5
Master 35.77 37.74 43.09 47.09 44.99 51.90 49.46 54.13 55.15 51.83 47.11
Extended 40.72 39.63 45.05 49.66 48.58 54.47 54.74 59.49 55.76 53.52 50.16
Importantly, all but the Top 1 results using Naïve Bayes in the last column
were significant, too. Thus, the results demonstrate that bug reports can be
better triaged by considering a larger set of existing bug reports by including
duplicate reports.
Bayes Top 3
Master 29.12 32.31 35.12 34.
Extended 36.53⇤
33.08 38.83⇤
35.
Top 5
Master 38.44 42.40 45.72 45.
Extended 45.72⇤
44.70 48.02 43.
Table 6.2: Percentages of reports correc
Model Result Training 1 2 3
SVM
Top 1
Master 14.57 14.30 14.16 18.
Extended 15.31 14.43 17.95 19.
Top 3
Master 28.59 28.46 31.84 37.
Extended 32.38 30.15 36.86 39.
Top 5
Master 37.13 36.04 41.67 46.
Extended 42.48 39.77 46.07 49.
Bayes
Top 1
Master 15.11 12.60 16.94 17.
Extended 15.24 13.75 18.50 20.
Top 3
Master 27.71 29.67 34.42 37.
Extended 32.11 29.40 36.72 39.
Top 5
Master 35.77 37.74 43.09 47.
Extended 40.72 39.63 45.05 49.
Importantly, all but the Top 1 results us
were significant, too. Thus, the results d
better triaged by considering a larger set
duplicate reports.
ECLIPSE MOZILLA

The information contained in Duplicate
reports the improves accuracy of
Machine Learning algorithms when
solving for the Bug Triage problem.

10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?

10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (13)

Similaire à 10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?

Similaire à 10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really? (20)

Plus de Nicolas Bettenburg

Plus de Nicolas Bettenburg (20)

Dernier

Dernier (20)

10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ... Really?