Presentation by Eleni Constantinou (joint work with Tom Mens, Software Engineering Lab, UMONS) at the BENEVOL 2016 Software Evolution Research Seminar, Utrecht University, The Netherlands
2. Introduction
Software ecosystem
• Collection of software projects that are developed and
evolve together in the same environment [1]
Ecosystem environment
• Development team ⇒ Social aspect
• Source code artefacts ⇒ Technical aspect
Modifications
• Social: Contributors joining/leaving
• Technical: New/obsolete source code files
[1] M. Lungu. Towards reverse engineering software ecosystems. Int'l Conf. Software Maintenance, pages 428-431, 2008.1
3. Introduction
Evolution
• Longevity
• Growth
Ecosystem sustainability
Long-term effect of social/technical modifications
A sustainable software ecosystem
can increase or maintain its
user/developer community over
longer periods of time and can
survive inherent changes such
as new technologies or new
products (e.g. from competitors)
that can change the population
(the community of users,
developers etc) [2]
[2] D. Dhungana, I. Groher, E. Schludermann, S. Biffl. Software ecosystems vs. natural ecosystems: learning from the
ingenious mind of nature. Eur. Conf. on Software Architecture: Companion Volume, pages 96-102, 2010. 2
8. Source Code Files
Refactoring activities
• Renamed files
• Moved files
Validity of renewal,
abandonment
measurements
7
9. Research Questions
RQ1 How does the ecosystem grow over time?
RQ2 How do the technical artefacts of the
ecosystem evolve?
RQ3 How does the ecosystem’s contributor
team evolve?
RQ4 How do changes in the contributor team
impact the technical artefacts?
8
10. Dataset
• Ruby ecosystem in GitHub
• GHTorrent dataset [2] (2016-09-05 dump)
• Timespan: October 2007 – September
2016
• Time unit: year quarters
• Commit activity
• Three levels: Base
project/Forks/Ecosystem
[2] G. Gousios. The GHTorrent dataset and tool suite. Working Conf. Mining Software Repositories, pages 233-236, 2013.9
11. Dataset Perils – Mitigation & filters
10
Filter Description Perils
1 Eliminate non-Ruby projects
2 Eliminate inactive projects Low project activity, inactive project,
repository is not a project
3 Eliminate isolated projects Personal projects
4 Eliminate forks without merges to
the base project
Inactive project, few projects use pull
requests
5 Eliminate short-lived contributors Noise of occasional/short-lived
contributors
6 Only consider source code files in
commits
Non-software development project
13. RQ1 How does the ecosystem grow over time?
12
Commits Lines of Code (LOC)
14. RQ1 How does the ecosystem grow over time?
13
Base Projects Forks
Quarter 25 (November 2013-February 2014)
Small number of new projects
15. 5 10 15 20 25 30
Quarters
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ProjectRenewal
ProjectAbandonment
5 10 15 20 25 30
Quarters
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ProjectRenewal
ProjectAbandonment
5 10 15 20 25 30
Quarters
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ProjectRenewal
ProjectAbandonment
RQ1 How does the ecosystem grow over time?
14
Base Projects Forks
Before quarter 25
• Base Projects: 30-40% new projects, less than 10% abandoned
• Forks: more than 60% new forks
16. RQ1 How does the ecosystem grow over time?
15
Evidence of contributor migration to JavaScript
After quarter 17 (December 2011)
Larger growth of JavaScript ecosystem
17. RQ2 How do the technical artefacts (files) evolve?
Base Projects Forks
Base projects: Bulk of development activity
After quarter 25: decrease of new files
16
18. RQ3 How does the contributor team evolve?
Base Projects Forks
Contributors leave forks but continue to participate in
base projects
After quarter 20: more Leavers ; less Joiners
Ecosystem
20. 19
Ecosystem
Active in
Ruby
JavaScript 18,038
Python 10,707
Java 7,363
C 6,406
Ecosystem
Abandoned
Ruby
Percentage
JavaScript 13,814 77%
Python 8,131 76%
Java 5,132 70%
C 4,174 65%
Most Ruby Leavers…
• worked in JavaScript projects in parallel to Ruby projects
• Continued to work in JavaScript after abandoning Ruby
RQ3 How does the contributor team evolve?
21. RQ4 How do changes in the contributor team
impact the technical artefacts?
Diversity index of Leavers
(relative entropy)
20
Increased Leaver specialization throughout time:
Large contribution to important projects
ONTRIBUTIONS TO OTHER ECOSYSTEMSOF RUBY ABANDONERS
anguage Active in Ruby Language Abandoned Ruby
vaScript 18,038 JavaScript 13,814
hell 10,707 Shell 8,982
ython 10,211 HTML 8,237
SS 9,875 Python 8,131
va 7,363 CSS 8,082
TML 7,056 Java 5,132
6,406 C 4,174
HP 5,839 Go 3,993
mL 5,050 VimL 3,768
++ 4,649 PHP 3,517
offeeScript 3,946 C++ 3,318
o 3,334 CoffeeScript 2,670
bjective-C 3,095 Objective-C 1,993
erl 2,408 Emacs Lisp 1,289
uppet 1,862 Perl 1,276
14] who observed that newcomers do not tend to become
ers.
nally, the abandonment rate exceeds the joining rate of
cosystem after quarter 25 since and the number of active
opersisreduced.Somethingiswrong in previoussentence
e and”? Please fix. Combined with our observations
erning the ecosystem projects in Section III, this reveals
ence of a possible correlation between developer aban-
ment and project abandonment. To further investigate the
vior of contributors abandoning the Ruby ecosystem, we
sured their activity on GitHub projects with another main
diversity can be measured according to Shannon’s en
the Simpson index, while the specialization of aspecie
level relative to the species in the other level is meas
the relative entropy (a.k.a. Kullback-Liebler divergen
[15]. By measuring the specialization of Leavers, w
the relative risk they cause to the ecosystem (acco
their relativecontribution) until they abandonedtheeco
As explained in [16], the specialization of a contri
expressed in terms of relative entropy is defined as:
Scj =
n
i = 1
wi j
Cj
(log
wi j
Cj
− log
Pi
W
)
where n is the number of projects and m the nu
contributorsin the ecosystem, wi j the workload of con
cj to project pi counted in number of lines of cod
m
j = 1 wi j is the total workload of all contributors to
pi , Cj =
n
i = 1 wi j is the total workload of contribut
all projects she is contributing to, and W =
n
i = 1
is the total ecosystem workload.
We computed the contributor specialization with
constraint on the project and ecosystem workload
precisely, weconsider theecosystem and contributor w
for the quarters where the contributor was active
ecosystem, that is from the quarter of her first con
until the quarter before abandoning the ecosystem. F
22. Threats to validity
• Multiple user accounts
• Less common within the same GitHub repository
• Identity merging [3]
• Programming language identification
• GitHub dataset
• Filters to eliminate noise
• Activity outside GitHub
• Merged pull requests appear as non-merged in GitHub
• Not all activity results from registered users
21
[3] M. Goeminne and T. Mens, “A comparison of identity merge algorithms for software repositories,” Science of Computer
Programming, vol. 78, no. 8, pages 971–986, 2013
23. Conclusion
Ruby software ecosystem in GitHub
• Investigate the permanent modifications of
the socio-technical network
• Impact of permanent changes in contributor
team on the technical artefacts
• Preliminary evidence about contributor
migration across different ecosystems
(Ruby → JavaScript)
Identify risks in project/ecosystem evolution
due to important team changes
22
24. Ongoing/Future Work
Contributor migration across different ecosystems
Advanced socio-technical analyses
• Socio-technical congruence
• Socio-technical debt
• Their effect on the ecosystem evolution
23