Software ecosystems form the heart of modern companies’ collaboration strategies with end users, open source developers and other companies. An ecosystem consists of a core platform and a halo of user contributions that provide value to a company or project. In order to sustain the level and number of high-quality contributions, it is crucial for companies and
contributors to understand how ecosystems tend to evolve and can be maintained successfully over time.
As a first step, this presentation explores the evolution characteristics of the statistical computing project GNU R, which is a successful, end-user programming ecosystem. We find that the ecosystem of user-contributed R packages has been growing steadily since R’s conception, at a significantly faster rate than core packages, yet each individual package remains stable in size. We also identified differences in the way user-contributed and core packages are able to attract an active community of users.
http://sail.cs.queensu.ca/publications/pubs/German-CSMR2013.pdf
The Evolution of the R Software Ecosystem (CSMR 2013)
1. The Evolution of the R Software Ecosystem
Daniel M. German
University ofVictoria
Bram Adams
École Polytechnique
de Montréal
Ahmed E. Hassan
Queen's University
3. An Ecosystem is ...
Jansen et al., ICSE '09
a set of (1) businesses
functioning as a unit and
interacting with a shared
market for (2) software and
services, together with (3) the
relationships among [the
businesses].
26. # Goals: A first look at R objects - vectors, lists, matrices, data frames.
# To make vectors "x" "y" "year" and "names"
x <- c(2,3,7,9)
y <- c(9,7,3,2)
year <- 1990:1993
names <- c("payal", "shraddha", "kritika", "itida")
# Accessing the 1st and last elements of y --
y[1]
y[length(y)]
# To make a list "person" --
person <- list(name="payal", x=2, y=9, year=1990)
person
# Accessing things inside a list --
person$name
person$x
# To make a matrix, pasting together the columns "year" "x" and "y"
# The verb cbind() stands for "column bind"
cbind(year, x, y)
# To make a "data frame", which is a list of vectors of the same length --
D <- data.frame(names, year, x, y)
nrow(D)
# Accessing one of these vectors
D$names
# Accessing the last element of this vector
D$names[nrow(D)]
# Or equally,
D$names[length(D$names)]
8
The
R
Language
49. Mailing
List
Data
Used
13
R-‐help
R-‐devel
MailMiner
[Be#enburg
et
al.]
50. Mailing
List
Data
Used
13
R-‐help
R-‐devel
MailMiner
[Be#enburg
et
al.]
PostgreSQL
51. How
does
a
Successful
Ecosystem
like
R
Evolve?
14
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
52. How
does
a
Successful
Ecosystem
like
R
Evolve?
14
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
53. 0.0
0.1
0.2
0.3
0.4
0.5
Proportion of files for a given extension
Proportionoffiles
●
●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
Base
Recommended
Popular
Contributed
rd
r
txt
hpp
rda
c
h
description
pdf
cpp
namespace
f
rdata
png
gif
java
rnw
save
html
xml
tex
s
q
citation
Documenta)on
Files
Dominate!
15
54. 0.0
0.1
0.2
0.3
0.4
0.5
Proportion of files for a given extension
Proportionoffiles
●
●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
Base
Recommended
Popular
Contributed
rd
r
txt
hpp
rda
c
h
description
pdf
cpp
namespace
f
rdata
png
gif
java
rnw
save
html
xml
tex
s
q
citation
Documenta)on
Files
Dominate!
15
documentaDon
55. 0.0
0.1
0.2
0.3
0.4
0.5
Proportion of files for a given extension
Proportionoffiles
●
●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
Base
Recommended
Popular
Contributed
rd
r
txt
hpp
rda
c
h
description
pdf
cpp
namespace
f
rdata
png
gif
java
rnw
save
html
xml
tex
s
q
citation
Documenta)on
Files
Dominate!
15
documentaDon
source
code
56. base recommended popular contributed
Size of Documentation per Package
Documentation Files (.rd)
Lines
0
100
1k
10k
100k
Extensive
Package
Documenta)on
16
5.3k 3.6k
1.7k
0.6k
57. Contributed
Packages
Contain
Less
Code
17
Size of Source Code per Package
r
Popular Contributed
SLOCs
0
100
1k
10k
100k
1M
All source code
Base Recommended Popular
SourceCodeperPackageurceCodeperPackage
base recommended popular contributed
Size of Documentation per Package
Documentation Files (.rd)
Lines
0
100
1k
10k
100k
7.3k 3.5k 1.8k
0.7k
61. 1550500
Number of Packages over Time
Total
●
●
●
●
●● ●● ●
●
1998 2000 2002 2004 2006 2008 2010
●
Base
Recommended
Popular
Contributed
Fast
Growth
of
Contributed
Packages
19
62. 1550500
Number of Packages over Time
Total
●
●
●
●
●● ●● ●
●
1998 2000 2002 2004 2006 2008 2010
●
Base
Recommended
Popular
Contributed
Fast
Growth
of
Contributed
Packages
19
super-‐linear
growth
63. 1550500
Number of Packages over Time
Total
●
●
●
●
●● ●● ●
●
1998 2000 2002 2004 2006 2008 2010
●
Base
Recommended
Popular
Contributed
Fast
Growth
of
Contributed
Packages
19
super-‐linear
growth
conservaDve
base/
recommended
evoluDon
64. Evolution of the Size of Source
1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999
010010k1M
Base Recommended Popu
e Size of Source Code per Package
2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011
Recommended Popular Contributed
Contributed
Packages
have
Stable
Size
20
05 2008 2011 1999 2002 2005 2008 2011 1999 200
Recommended Popular Contributed
2007 2010 1999 2002 2005 2008 2011 1999 2002
Base Recommended Popular
65. Number of Releases Per Package
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
15102050160
● Recommended
Popular
Contributed
The
Less
Core,
the
Less
Releases
21
66. Number of Releases Per Package
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
15102050160
● Recommended
Popular
Contributed
The
Less
Core,
the
Less
Releases
21
50%
had
<=17
releases
67. Number of Releases Per Package
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
15102050160
● Recommended
Popular
Contributed
The
Less
Core,
the
Less
Releases
21
50%
had
<=3
releases
50%
had
<=17
releases
68. Date of Latest Release per Package
●
●
● ●
● ● ● ● ● ● ● ● ● ● ●
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
2003
2004
2005
2006
2007
2008
2009
2010
2011
● Recommended
Popular
Contributed
...
but
Contributed
Packages
are
Ac)vely
Maintained!
22
>90%
of
packages
had
release
in
last
2
years
72. 24
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
extensive
documenta)on
small
contributed
packages
fast
growth
of
contributed
packages
stable
package
size
ac)ve
maintenance
73. 24
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
extensive
documenta)on
small
contributed
packages
fast
growth
of
contributed
packages
stable
package
size
ac)ve
maintenance
74. 0510152025 Number of Dependencies Per Package
Proportion of Packages
NumberofDependencies
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recommended
Popular
Contributed
Packages
have
Few
Dependencies
75. 0510152025 Number of Dependencies Per Package
Proportion of Packages
NumberofDependencies
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recommended
Popular
Contributed
Packages
have
Few
Dependencies
1/3
has
NONE
76. 0510152025 Number of Dependencies Per Package
Proportion of Packages
NumberofDependencies
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recommended
Popular
Contributed
Packages
have
Few
Dependencies
1/3
has
NONE 1/4
has
1
dependency
77. Number of Dependents Per Package
Proportion of Packages
NumberofDependents
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0131050260
Recommended
Popular
Contributed
Contributed
Packages
are
Higher-‐Level
78. Number of Dependents Per Package
Proportion of Packages
NumberofDependents
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0131050260
Recommended
Popular
Contributed
Contributed
Packages
are
Higher-‐Level
NO
dependents
79. Number of Dependents Per Package
Proportion of Packages
NumberofDependents
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0131050260
Recommended
Popular
Contributed
Contributed
Packages
are
Higher-‐Level
NO
dependents
50%
popular
packages
has
<=6
dependents
80. 27
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
extensive
documenta)on
small
contributed
packages
fast
growth
of
contributed
packages
stable
package
size
ac)ve
maintenance
81. 27
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
extensive
documenta)on
small
contributed
packages
fast
growth
of
contributed
packages
stable
package
size
ac)ve
maintenance
few
dependencies
contributed
packages
are
higher
level
82. 27
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
extensive
documenta)on
small
contributed
packages
fast
growth
of
contributed
packages
stable
package
size
ac)ve
maintenance
few
dependencies
contributed
packages
are
higher
level
83. 1998 2000 2002 2004 2006 2008 2010
05000100001500020000
#messages
● ● ● ● ●
●
●
●
●
●
●
● ●
●
base
recommended
popular
contributed
Contributed
Packages
Generate
More
User
Traffic
84. 1998 2000 2002 2004 2006 2008 2010
05001000150020002500
#messages
● ●
● ● ●
●
●
●
● ● ●
●
●
●
base
recommended
popular
contributed
Contributed
Packages
take
over
Developer
Traffic
85. 1998 2000 2002 2004 2006 2008 2010
05001000150020002500
#messages
● ●
● ● ●
●
●
●
● ● ●
●
●
●
base
recommended
popular
contributed
Contributed
Packages
take
over
Developer
Traffic
92. Time
instant
day
week
month
year
5 year
10 year
1st msg. 10th msg. 100th msg. 1000th msg.
base
recommended
popular
contributed
Star)ng
up
a
Community
takes
1
Year
3
months 1
year
5
months
slower 44.9%
gets
here
93. Time
instant
day
week
month
year
5 year
10 year
1st msg. 10th msg. 100th msg. 1000th msg.
base
recommended
popular
contributed
Star)ng
up
a
Community
takes
1
Year
3
months 1
year
5
months
slower
only
6.5%
gets
this
far
44.9%
gets
here
94. 32
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
extensive
documenta)on
small
contributed
packages
fast
growth
of
contributed
packages
stable
package
size
ac)ve
maintenance
few
dependencies
contributed
packages
are
higher
level
95. 32
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
extensive
documenta)on
small
contributed
packages
fast
growth
of
contributed
packages
stable
package
size
ac)ve
maintenance
few
dependencies
contributed
packages
are
higher
level
strong
compe))on
for
a#en)on
building
a
community
takes
a
year
96. So
What?
• How
do
contributors
deal
with
the
fight
for
aYenDon?
–
What
is
their
mo)va)on?
–
How
much
effort
do
they
spend
on
their
package?
• How
does
a
package
become
popular/recommended?
–
Do
bloggers/books
have
an
impact?
–
Or
is
it
the
other
way
around?
• How
do
R-‐forge
and
the
core
team
ensure
high
quality
releases
without
broken
packages?
• ...
97.
98. Bosch, SPLC '09
Desktop ecosystems for end-
user programming are the
holy grail of software platforms!
100. 37
Package
Characteris)cs Package
Evolu)on Package
Dependencies Package
Community
extensive
documenta)on
small
contributed
packages
fast
growth
of
contributed
packages
stable
package
size
ac)ve
maintenance
few
dependencies
contributed
packages
are
higher
level
strong
compe))on
for
a#en)on
building
a
community
takes
a
year
101. 1st International Workshop on Release Engineering
http://releng.polymtl.ca May 20, 2013, San Francisco, USA
RELENG 2013