This document summarizes research on software ecosystems and their health. It discusses how social and technical factors influence ecosystem evolution and survival. Studies examined package dependency networks and found packages depend on many indirect dependencies over time. Social changes like developer abandonment can greatly impact ecosystems. Developers who commit and communicate less frequently are more likely to abandon projects sooner. Both technical and social activity are needed for package and developer longevity. Current work aims to merge developer identities across platforms and forecast inactivity to help monitor ecosystem health.
5. SECO-Assist
Automated Assistance for
Developing Software in
Ecosystems of the Future
secoassist.github.io
Inter-university research project
Tom Mens
University of Mons
Anthony Cleve
Université de Namur
Coen De Roover
Vrije Universiteit Brussel
Serge Demeyer
University of Antwerp
10. Evolution of package
dependency networks
A Decan, T Mens (2018) An Empirical Comparison of
Dependency Network Evolution in Seven Software Packaging
Ecosystems. Empirical Software Engineering
Seven package dependency networks extracted using open source discovery
service http://libraries.io (CC BY-SA 4.0)
830K packages – 5.8M package versions – 20.5M dependencies
11. Package changes are frequent
Findings
• #package updates grows over time
• >50% of package releases are updated within 2
months.
• Required and young packages are updated more
frequently.
Changeability index:
Maximal value n such that there exist n packages having
been updated at least n times during the last month.
CRAN differs due to rolling release policy:
“Submitting updates should be done
responsibly and with respect for the
volunteers’ time. Once a package is
established, ‘no more than every 1–2
months’ seems appropriate.”
12. Package changes are frequent
Package updates may cause many maintainability issues
or even failures in dependent packages.
"Especially with respect to package
dependencies, the risk of things breaking at
some point due to the fact that a version of a
dependency has changed without you
knowing about it is immense. That actually
cost us weeks and months in a couple of
professional projects I was part of."
13. Most packages depend on other
packages
Findings
• 60% to 80% of all packages are
connected.
• A stable minority (20%) of required
packages collect over 80% of all reverse
dependencies.
• # npm dependencies grows much faster.
Reusability index:
Maximal value n such that there exist n required packages having at least n dependent packages.
14. Package changes may have
important impact
March 2016
Unexpected removal of left-pad
Caused > 2% of all packages to break
(> 5,400 packages)
November 2010
Release 0.5.0 of i18n broke dependent package
ActiveRecord
Transitively required by >5% of all packages
16. Most of the complexity is deeply hidden …
… in the transitive dependencies
Proportion of top-level packages by depth of dependency tree
Over 50% of top-level packages have
deep dependency tree.
Ecosystem complexity
17. Package changes may have
important impact
Evolution of 5-Impact Index
Findings
• Dependent packages have few direct
but many transitive dependencies.
• Ratio of indirect over direct
dependencies increases over time.
P-Impact Index :
Number of packages that are transitively required by at least P% of all packages.
27. Evolution of package dependency networks
E Constantinou, T Mens (2017) Socio-Technical Evolution of the
Ruby Ecosystem in GitHub. SANER 2017
26K packages/projects, 69K forks
76K contributors
5M commits
32. Evolution of package dependency networks
E Constantinou, T Mens (2017) An Empirical Comparison of
Developer Retention in the RubyGems and npm Software
Ecosystems. Innovations in Systems and Software Engineering
70K packages/projects
32K contributors
3M commits
1.5M messages
179K packages/projects
64K contributors
8M commits
4M messages
33. SECO health – Survival
Socio-technical activity
• Intensity
• Frequency
• Inactivity length
Survival analysis
35. SECO health – Developer survival
Population: all developers in an ecosystem
Event: abandonment of a developer
Developers tend to abandon the ecosystem sooner
if they:
do not communicate
communicate less intensively
communicate less frequently
do not communicate for a longer period
0 50 100 150 200
0.00.20.40.60.81.0
npm
Duration of commit activity (months)
Survivalprobability
Social inactivity Social activity Social abandoner
0 50 100 150
0.00.20.40.60.81.0
RubyGems
Duration of commit activity (months)
Survivalprobability
Social inactivity Social activity Social abandoner
0 50 100 150 200
0.00.20.40.60.81.0
npm
Duration of commit activity (months)
Survivalprobability
Very Short Short Long Very Long
0 50 100 150
0.00.20.40.60.81.0
RubyGems
Duration of commit activity (months)
Survivalprobability
Very Short Short Long Very Long
36. SECO health – Developer survival
Developers tend to abandon the ecosystem sooner
if they:
commit less intensively
commit less frequently
do not commit for longer periods 0 50 100 150 200
0.00.20.40.60.81.0
npm
Duration of commit activity (months)
Survivalprobability
Very Weak Weak Strong Very Strong
0 50 100 150
0.00.20.40.60.81.0
RubyGems
Duration of commit activity (months)
Survivalprobability
Very Weak Weak Strong Very Strong
0 50 100 150 200
0.00.20.40.60.81.0
npm
Duration of commit activity (months)
Survivalprobability
Very Short Short Long Very Long
0 50 100 150
0.00.20.40.60.81.0
RubyGems
Duration of commit activity (months)
Survivalprobability
Very Short Short Long Very Long
38. SECO health – Package survival
Population: all packages in an ecosystem
Event: commit inactivity of a package
Packages tend to become inactive sooner if the developers contributing
to these packages:
do not communicate
communicate less intensively
communicate less frequently
do not communicate for a longer period
39. SECO health – Package survival
Packages tend to become inactive sooner if the developers contributing
to these packages:
commit less intensively
commit less frequently
do not commit for longer periods
40. SECO health – Survival
Intense and frequent
commit activity is not enough
…
Intense and frequent
messaging activity is also
necessary
Technical Diversity: different platforms, different programming languages, different application domains, different packages with similar functionality
Community Smells: Lone Wolfs, Isolated Teams, Communication Problems
Contributor Abandonment: Rage quitting
npm and nuget more subject to package updates.
CRAN less subject to package updates.
“The package leftpad essentially contains a few lines of source code but has thousands of dependent projects, including Node and Babel.
When its developer decided to unpublish all his modules for npm, this had important consequences, “almost breaking the internet “
March 2016
Unexpected removal of left-padcaused > 2% of all packages to break
(> 5,400 packages)
RubyGems, November 2010
Release 0.5.0 of i18n broke dependent package ActiveRecord, transitively required by>5% of all packages (930)
Study factors affecting the time to event (such as child birth, recovering from a disease, etc).
Estimate the survival rate of a population over time, considering the notion of censoring.