The document discusses a study of workload variation and specialization among contributors to the Gnome open source software ecosystem. It finds that most contributors are occasional and involved in a single activity type, like translation, while frequent contributors specialize in a few activities like coding. The study used statistical analysis of git repository data to identify contributors, their activities, and specialization levels over time in the Gnome community. Future work could examine subsets of projects and trends in contributor migration patterns.
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Gnome Workload Variation Study
1. On the variation and
specialisation of workload
The Gnome case
B.Vasilescu, A. Serebrenik, M. Goeminne, T. Mens
mardi 4 décembre 2012
2. Gnome as an ecosystem
• Ecosystem: set of interconnected projects
• ~ 1400 projects
• ~ 3000 contributors
• 15 years of activity
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
3. How does workload vary
across contributors?
• Who are they?
• What do they do?
• How do they do it?
A partial answer by analysing the git repositories.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
4. Who are the contributors?
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
5. Identity matching
• Contributors have an account per project
repository…
• … and sometimes more than one.
• No explicit links between the accounts,
need to guess them.
• Based on names and e-mails found in the
git repositories.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
6. Identity matching (cont.)
• (semi) automatic classification techniques.
• Must take into account variations, abbreviations,
permutations, misspelling, nicknames, etc.
• No perfect process: even a manualy post-checked result can
contain false positives and false negatives.
• Since Gnome has no strict identification regulation on the
whole, some matches are not detectable without an extra
context information. Fictitious example:
• Robbie Williams <robbiew@gnome.org>
• Euphegenia Doubtfire <euphegenia@gmail.com>
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
7. What do the
contributors do?
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
8. 13 activity types
• Identified by the path, name and extension
of the touched files.
• Coding : *.c, *.java, etc.
• Translation : *.po, etc.
• Testing : */test/*, etc.
• ...
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
9. How do the contributors
contribute?
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
10. Metrics
• APTW(p,c,t) : Number of files touched by
the contributor c performing an activity of
type t in a project p.
• Derived metrics, by aggregation: max, sum,
etc.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
11. Workload
600
500
• 50% contributors
Number of authors
400
made < 14 changes.
300
• 1 contributor made
200
185,874 changes.
100
0
0 2 4 6 8 10 12
log(AW)
Université de Mons Rapport de formation doctorale 2011 Mathieu Goeminne
mardi 4 décembre 2012
12. The more things you do,
the more things you can!
• Correlations
• Between the number of activity types and
the workload.
• Between the number of projects and the
workload.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
13. Favorite activities of contributors
having ≥ 14 changes
• Most frequent
contributors
specialise in coding
and development
documentation.
• The other activities
are not subject to
specialisation.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
14. Favorite activities of contributors
having < 14 changes
• Most occasional
contributors
specialise in
translation and
coding.
• The other activities
are not subject to
specialisation.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
15. How strongly do the
contributor’s focus?
• Basic measure : RATW(c,t)
• % of the total workload of c dedicated to t.
• Use of Gini as inequality index:
• Value in [0, 1[
• 0 if the workload is equally distributed.
• Close to 1 if the workload is
concentrated in few activity types.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
16. Contributor’s focus (cont.)
• Occasional contributors typically participate
in a single activity type.
• Frequent contributors typically participate
in few activity types.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
17. To summarise
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
18. What did we learn?
• Most contributors are occasional and are involved
in only one activity type; few are very active;
frequent contributors are involved in few activity
types.
• The more things you do, the more things you can.
• Occasional contributors are translators, involved
in many projects. Frequent contributors are
coders and are involved in few projects.
• And more again in our paper.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
19. How did we do it?
• Contributor matching: semi-automatic
and automatic methods.
• Activity identification based on file
path/name/extension rules.
• Advanced statistical analysis (among
others for the partial ordering of activity
types).
• Specialisation: aggregation with inequality
indices.
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
20. In the future
• Add a temporal aspect: How does the
contributors’ behaviour change over time?
• Consider subsets of Gnome: subecosystems
composed by projects sharing stronger
properties than all projects on average:
archived, by theme, etc.
• Combine both by studying migration trends.
•…
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012
21. Thank you
On the variation and specialisation of workload – A case study of the Gnome ecosystem
community
B. Vasilescu, A. Serebrenik, M. Goeminne, T. Mens
Empirical Software Engineering
Waiting for being accepted
Variation and specialisation of workload Benevol 2012
mardi 4 décembre 2012