Conference presentation from CSMR 2013, Genova, Italy.
Abstract: Completely analyzed and closed issue reports in
software development projects, particularly in the development of safety-critical systems, often carry important information about issue-related change locations. These locations may be in the source code, as well as traces to test cases affected by the issue, and related design and requirements documents. In order
to help developers analyze new issues, knowledge about issue clones and duplicates, as well as other relations between the new issue and existing issue reports would be useful. This paper analyses, in an exploratory study, issue reports contained in two Issue Management Systems (IMS) containing approximately 20.000 issue reports. The purpose of the analysis is to gain a better understanding of relationships between issue reports
in IMSs. We found that link-mining explicit references can
reveal complex networks of issue reports. Furthermore, we
found that textual similarity analysis might have the potential to complement the explicitly signaled links by recommending additional relations. In line with work in other fields, links between software artifacts have a potential to improve search and navigation in large software engineering projects.
2. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Markus Borg
Dietmar Pfahl Per Runeson
University of Tartu
Estonia
Lund University
Sweden
• Third year PhD student
• MSc CS and engineering
• Software developer (2007-2010)
• Empirical research group
3. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Agenda
• Background and Context
– Information management
– Safety-critical development
– Impact analysis
• Goal and method of this study
• Results
• Future work
5. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Information management
• Large projects, much information
6. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Challenges
• A state of information overload
– Engineers cannot process all
information
– Causes stress
– Obstructs decision making
• Poor findability
– More effort to navigate information
landscape
7. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Intensified in safety development
• Safety standards mandate documentation
Railroad Nuclear Process Machinery Automotive
Industry
8. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Mandated documents in IEC 26262
9. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Work task: Impact Analysis (IA)
• Required by IEC 61508 before changes to production code
• Studied an industrial case
– Documented
– Reviewed during safety audits
Requirements
Tests
10. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Work task: Impact Analysis (2)
• Formal template
• Impact on code and non-code
specified as traceability links
• Manual work
IMS
11. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Supporting the impact analysis?
Work task
?
Reqs. DB
Code
Repo
Test DB
12. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Reuse knowledge from previous IAs
13. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Information in networks, so what?
• Search in hyperlinked structures well researched
– Also applied in software engineering (Karabatis et al. (2009))
HITS algorithm
Page et al. (1999)
Kleinberg (1999)
14. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Networks of issue reports
• What type of networks can we find in issue databases?
?
16. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Issue databases under study
• Safety IMS (2000-2012)
– Industrial control system
– Mandated by strict processes
– Issues submitted by engineers
• Android IMS (2007-2012)
– OS for handheld devices
– Open source software
– Issues submitted to public database
17. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Link mining in the issue databases
• Safety IMS
– ”Related cases” field in database
• Android IMS
– No separate field for linking issues
– Communication using comments
(100,000+)
– Developers refer to other issues,
stored as HTML hyperlinks
• Extracted using regular expressions
22. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Example of sub-network
Bug star
One central issue
report pointing at
several others
Caused by duplicates
23. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Example of sub-network
Dense ring
Most issue reports are
connected.
Caused by copy-
paste comments
24. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Extracted networks
What do developers signal by
creating HTML hyperlinks?
25. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Link semantics
• Indicate relationships with different certainty
– Related issue report (possibly probably definetely)
– Duplicate issue report (possibly probably definetely)
– Cloned issue report
• Misc. links
– Raising awareness of issue reports
– Release planning
– Links with the wrong target
• Links appear to carry meaning
27. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Contents of IA reports in the Safety IMS
Code
HW description
Misc. documents
Test case
User manual
Test case
28. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Mining IA reports in the Safety IMS
• ~ 5,000 impact analysis reports
Node types
• Issue reports
• Requirements
• Test specifications
• Hardware descriptions
Link types
• Related issue
• Specified by
• Verified by
• Needs update
• Impacted HW
29. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Extracted semantic network
• 27,958 nodes
– ~26,000 issue reports
– ~3,000 other artifacts
• 28,230 links
– ~18,000 related issue
– ~4,000 specified by
– ~2,300 verified by
30. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Extracted semantic network –
Circle layout
32. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
How can the networks be exploited?
33. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Neighbourhood search
Application 1:
Search for
connected
artifacts
34. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Centrality measures
Application 2: Identification of key artifacts (ranking)
35. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Goal: Impact Recommender
1. Identify similar issues
2. Identify neighbours
3. Rank candidates
Far
awayTextual
sim.
High
cent.
36. Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Summary
• Link mining in IMSs can discover complex issue networks
– The process-heavy IMS contains more links
– Links among issue reports, created in comments by Android
developers, typically signal relations
• Networks of issue reports can be extended by other artifacts
• Networked information enables better navigation
- Broaden search (following links)
- Sharpen search (better ranking)
Research agenda inspired by the my experiences as a developer.
Knowledge workers spend too much time finding infoFindability definition: “the degree to which a system or environment supports navigation and retrieval”
Document driven environment, strict process requirements
Just as an example…
In a specificindustrialcaseBefore changes to production codeIn the safety audits to strengthen the safety case
IMS = Issue Management System
Multiple targets possible, not a mutual link by default.
All links are directed. Sometimes two issue reports are connected in both ways.
The links are not weighted, a link is established when a comment in an issue report targets another report. Additional hyperlinks do not generate new links.
Duplicates
You even see self linksThe dense link structure, and the self-links, were created when a developer posted the comment “Do we have one ground for many problems?” followed by links to 7 reports. This comment was copy-pasted to all seven reports.
Qualitative analysisDefine related/duplicate/cloneHowever, our findings indicate that a majority of the links express a meaningful relation. One reason might be that it requires some effort by a developer to create a link, thus they appear to be correct.
Information in IA reports is semi-structured. Free text, but answers provided in the context of questions.
Could also extract semantic information
Discovered a significant network structureSemantic information
All artifacts on the perimeter of the circle. Few areas are not intertwined in complex ways.
Note that there is semantic information here as well, represented by the colors here.
As made popular by a big American company in the search business.