Written communications recorded through channels
such as mailing lists or issue trackers, but also code cochanges, have been used to identify emerging collaborations in software projects. Also, such data has been used to identify the relation between developers’ roles in communication networks and source code changes, or to identify mentors aiding newcomers to evolve the software project. However, results of such analyses may be different depending on the communication channel being mined. This paper investigates how collaboration links vary and complement each other when they are identified through data from three different kinds of communication channels, i.e., mailing lists, issue trackers, and IRC chat logs. Also, the study investigates how such links overlap with links mined from code changes, and how the use of different sources would influence (i) the identification of project mentors, and (ii) the presence of a correlation between the social role of a developer and her changes. Results of a study conducted on seven open source projects indicate that the overlap of communication links between the various sources is relatively low, and that the application of networks obtained from different sources may lead to different results.
Introduction to Prompt Engineering (Focusing on ChatGPT)
How Developers’ Collaborations Identified from Different Sources Tell us About Code Changes
1. How Developers’ Collaboration
Identified from Different Sources Tell us
About Code Changes
Sebastiano Gabriele Massimiliano Gerardo Giuliano
Panichella Bavota Di Penta Canfora Antoniol
10/3/2014
2. Outline
Context and Motivations
- Software Development
Case Study
- Seven Open Source Projects
Results
- Evaluation of Developers Collaboration Identified from Different Sources
- Application of Networks Obtained from Different Sources
10/3/2014
3. Different Sources of Information…
‘‘…In everybody’s experience, different communication
channels play different, sometimes complementary
sometimes alternative, roles: news can be gathered
(and shared) from the radio, by reading a newspaper,
watching a TV broadcast or surfing blogs.’’.
10/3/2014
17. Focusing on a single source:
10/3/2014
I would Bavota as
first author…
18. Focusing on a single source:
10/3/2014
I would Panichella
as first author…
19. Merging all the sources:
10/3/2014
I would Panichella
as first author…
I say Panichella..
I would Panichella
as first author…
I would Bavota as
first author…
I say Bavota..
27. How Developers’ Collaborations Networks
Identified from Different Sources Differ?
10/3/2014
IRC CHAT LOG
ISSUE TRACKER
MAILING LIST
VERSIONING SYSTEM
28. Case Study
Goal: investigating how different communication channels would
provide different views of developers’ interaction and the use of
such information in recommender systems could produce different
results.
Research questions:
• RQ1: to what extent do developers discuss through the different
communication channels?
• RQ2: How do the inferred links between developers overlap when
using different sources of information?
• RQ3: How do social network metrics change when using different
sources, and how would this impact on using such information to
build recommenders?
10/3/2014
29. Context - Objects
10/3/2014
Project from Andr. Api Period KLOC
Apache HTTPD June 2011-June 2013 2,021-2,240
Apache CXF June 2011-June 2013 593–771
Hibernate June 2011-June 2013 984–1,096
Infinispan June 2011-June 2013 146–286
Apache Lucene June 2011-June 2013 198–437
Samba June 2010-June 2012 1,278–1426
Weld June 2011-June 2013 108–139
37. Apache CXF
Hibernate
Infinispan
10/3/2014
RQ1: to what extent do developers discuss
through the different communication channels?
Apache Httpd
Apache Lucene
Samba
Weld
38. Apache CXF
Hibernate
Infinispan
10/3/2014
RQ1: to what extent do developers discuss
through the different communication channels?
Apache Httpd
Apache Lucene
Samba
Weld
Developers mainly use two out of
three communication channels,
whereas the third one is only
used sporadically.
39. Apache CXF
Hibernate
Infinispan
10/3/2014
RQ1: to what extent do developers discuss
through the different communication channels?
Apache Httpd
Apache Lucene
Samba
Weld
Developers mainly use two out of
three communication channels,
whereas the third one is only
used sporadically.
While in the past developers
used emails as main
communication channel,
nowadays they are massively
using chats or issue trackers.
40. Apache CXF
Hibernate
Infinispan
10/3/2014
35% 56%
ISSUE and CHAT
ISSUE and MAIL
<
MAIL and CHAT
MAIL and ISSUE
<
50%
86%
Apache Httpd
Apache Lucene
Samba
Weld
Developers Overlap between
Different Sources
41. RQ2: how do the inferred links between developers
overlap when using different sources of information?
Apache Httpd
Apache CXF
Hibernate
Infinispan
Apache Lucene
Samba
Weld
10/3/2014
35% 56%
ISSUE and CHAT
ISSUE and MAIL
<
MAIL and CHAT
MAIL and ISSUE
<
50%
86%
42. RQ2: how do the inferred links between developers
overlap when using different sources of information?
Apache Httpd
Apache CXF
Hibernate
Infinispan
Apache Lucene
Samba
Weld
10/3/2014
26% 38%
ISSUE and CHAT
ISSUE and MAIL
<
MAIL and CHAT
MAIL and ISSUE
<
20%
30%
43. During an IRC Chat Meeting
10/3/2014
“is there a better way?
dunno like I said this is
brainstorming and I have
not given lots of thought
to these cases”
“but we also need to
create the attributes
and values in
the entity binding..”
44. During an IRC Chat Meeting
10/3/2014
“is there a better way?
dunno like I said this is
brainstorming and I have
not given lots of thought
to these cases”
1) Brainstorming
“however planning
a pure standalone
test suite would
make things
easier...”
45. During an IRC Chat Meeting
10/3/2014
“is there a better way?
dunno like I said this is
brainstorming and I have
not given lots of thought
to these cases”
“however planning
a pure standalone
test suite would
make things
easier...”
1) Brainstorming
2) Planning
(e.g. Testing
activities)
46. During an IRC Chat Meeting
10/3/2014
“okay I think it is a bug
and I’m going to
create a jira first”
“however planning
a pure standalone
test suite would
make things
easier...”
1) Brainstorming
2) Planning
(e.g. Testing
activities)
3) Open an
Issue
47. Similarity Measure of Topics Extracted from
10/3/2014
Different Communication Channels
issues vs.
mails
issues vs. chat mails vs. chat
Apache Httpd 0.17 0.09 0.06
Apache CXF 0.86 0.11 0.01
Hibernate 0.11 0.02 0.03
Infinispan 0.07 0.03 0.03
Apache Lucene 0.08 0.03 0.02
Samba 0.06 0.02 0.02
Weld 0.11 0.04 0.03
48. Similarity Measure of Topics Extracted from
10/3/2014
Different Communication Channels
issues vs.
mails
issues vs. chat mails vs. chat
> >
> >
>
Apache Httpd 0.17 0.09 0.06
Apache CXF 0.86 0.11 0.01
Hibernate 0.11 0.02 0.03
Infinispan 0.07 > 0.03 ≥
0.03
Apache Lucene 0.08 0.03 0.02
> >
> ≥
Samba 0.06 0.02 0.02
Weld 0.11 0.04 0.03
> >
49. RQ3: How do social network metrics change when using
different sources, and how would this impact on using such
information to build recommenders?
10/3/2014
50. RQ3: How do social network metrics change when using
different sources, and how would this impact on using such
information to build recommenders?
Social Network Metrics:
- Identifying high-degree developers;
- Identifying mentors.
10/3/2014
(Canfora et al. - FSE 2012).
Social Network Metrics vs. Code Changes:
- Correlation between social roles and change activities.
(replicating the study by Bird et al. - MSR 2006).
51. Apache CXF
Hibernate
Infinispan
10/3/2014
Mentors Overlap between
Different Sources
Apache Httpd
Apache Lucene
Samba
Weld
52. Apache CXF
Hibernate
Infinispan
10/3/2014
Mentors Overlap between
Different Sources
41%
Considering ALL SOURCES
Apache Httpd
Apache Lucene
Samba
Weld
53. Apache CXF
Hibernate
Infinispan
10/3/2014
Mentors Overlap between
Different Sources
41% 47%
Considering ALL SOURCES
MAIL and ISSUE
<
Apache Httpd
Apache Lucene
Samba
Weld
54. Apache CXF
Hibernate
Infinispan
10/3/2014
High Degree Contributors Overlap
between Different Sources
41% 47%
Considering ALL SOURCES
MAIL and ISSUE
<
Apache Httpd
Apache Lucene
Samba
36%
Weld Considering ALL SOURCES
55. High Degree Contributors Overlap
Apache CXF
Hibernate
Infinispan
10/3/2014
between Different Sources
41% 47%
Considering ALL SOURCES
MAIL and ISSUE
<
Apache Httpd
Apache Lucene
Samba
Weld Considering ALL SOURCES
MAIL and ISSUE
<
36%
46%
56. Ohloh Kudos Score
10/3/2014
Kudos score:
level of appreciation
or respect of a
developer working
for a project. It is
based on the
judgement of other
project members.
http://www.ohloh.net/p/apache/contributors
57. Issue, Chat and Email to Identify Leaders
Hibernat
e Samba
Apache
Lucene
10/3/2014
0%
20%
40%
20%
20%
40%
60%
60%
60%
60%
60%
80%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
Leaders
Leaders
Leaders
Leaders
Apache
HTTPD
Precision in Recommending Leaders
MAIL ISSUE CHAT
58. Replication of the Work by Bird et al.
10/3/2014
Bird et al. - MSR 2006
‘‘Developers who actually commit changes,
play much more significant roles in the
email community than non-developers’’