SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
On the variation and
                   specialisation of workload
                       The Gnome case
                        B.Vasilescu, A. Serebrenik, M. Goeminne, T. Mens




mardi 4 décembre 2012
Gnome as an ecosystem

                   • Ecosystem: set of interconnected projects
                   • ~ 1400 projects
                   • ~ 3000 contributors
                   • 15 years of activity

                          Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How does workload vary
                 across contributors?

                   • Who are they?
                   • What do they do?
                   • How do they do it?
              A partial answer by analysing the git repositories.

                          Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Who are the contributors?



                        Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Identity matching
                   • Contributors have an account per project
                        repository…
                   • … and sometimes more than one.
                   • No explicit links between the accounts,
                        need to guess them.
                   • Based on names and e-mails found in the
                        git repositories.

                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Identity matching (cont.)
                   •    (semi) automatic classification techniques.
                   •    Must take into account variations, abbreviations,
                        permutations, misspelling, nicknames, etc.
                   •    No perfect process: even a manualy post-checked result can
                        contain false positives and false negatives.
                   •    Since Gnome has no strict identification regulation on the
                        whole, some matches are not detectable without an extra
                        context information. Fictitious example:
                        •   Robbie Williams <robbiew@gnome.org>
                        •   Euphegenia Doubtfire <euphegenia@gmail.com>


                               Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
What do the
                        contributors do?


                         Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
13 activity types
                   • Identified by the path, name and extension
                        of the touched files.
                        • Coding : *.c, *.java, etc.
                        • Translation : *.po, etc.
                        • Testing : */test/*, etc.
                        • ...
                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How do the contributors
                contribute?


                        Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Metrics

                   • APTW(p,c,t) : Number of files touched by
                        the contributor c performing an activity of
                        type t in a project p.
                   • Derived metrics, by aggregation: max, sum,
                        etc.



                               Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Workload
                    600
                    500




                                                                               • 50% contributors
Number of authors

                    400




                                                                                 made < 14 changes.
                    300




                                                                               • 1 contributor made
                    200




                                                                                 185,874 changes.
                    100
                    0




                          0     2        4       6       8       10       12

                                              log(AW)
                    Université de Mons   Rapport de formation doctorale 2011   Mathieu Goeminne
  mardi 4 décembre 2012
The more things you do,
         the more things you can!
             • Correlations
              • Between the number of activity types and
                        the workload.
                   • Between the number of projects and the
                        workload.



                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Favorite activities of contributors
                     having ≥ 14 changes

                   • Most frequent
                        contributors
                        specialise in coding
                        and development
                        documentation.
                   • The other activities
                        are not subject to
                        specialisation.

                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Favorite activities of contributors
                      having < 14 changes

                   • Most occasional
                        contributors
                        specialise in
                        translation and
                        coding.
                   • The other activities
                        are not subject to
                        specialisation.

                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How strongly do the
                        contributor’s focus?
                • Basic measure : RATW(c,t)
                  • % of the total workload of c dedicated to t.
                  • Use of Gini as inequality index:
                    • Value in [0, 1[
                      • 0 if the workload is equally distributed.
                      • Close to 1 if the workload is
                          concentrated in few activity types.

                           Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Contributor’s focus (cont.)

         • Occasional contributors typically participate
                in a single activity type.
         • Frequent contributors typically participate
                in few activity types.




                          Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
To summarise



                        Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
What did we learn?
                   •    Most contributors are occasional and are involved
                        in only one activity type; few are very active;
                        frequent contributors are involved in few activity
                        types.
                   •    The more things you do, the more things you can.
                   •    Occasional contributors are translators, involved
                        in many projects. Frequent contributors are
                        coders and are involved in few projects.
                   •    And more again in our paper.

                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How did we do it?
                   • Contributor matching: semi-automatic
                        and automatic methods.
                   • Activity identification based on file
                        path/name/extension rules.
                   • Advanced statistical analysis (among
                        others for the partial ordering of activity
                        types).
                   • Specialisation: aggregation with inequality
                        indices.
                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
In the future
                   • Add a temporal aspect: How does the
                        contributors’ behaviour change over time?
                   • Consider subsets of Gnome: subecosystems
                        composed by projects sharing stronger
                        properties than all projects on average:
                        archived, by theme, etc.
                   • Combine both by studying migration trends.
                   •…
                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Thank you

       On the variation and specialisation of workload – A case study of the Gnome ecosystem
       community
       B. Vasilescu, A. Serebrenik, M. Goeminne, T. Mens
       Empirical Software Engineering
       Waiting for being accepted

                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012

Contenu connexe

Similaire à Gnome Workload Variation Study

An Analytics Toolkit Tour
An Analytics Toolkit TourAn Analytics Toolkit Tour
An Analytics Toolkit TourRory Winston
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)Chiradeep Vittal
 
An empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source CommunitiesAn empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source CommunitiesTom Mens
 
DockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operationsDockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operationsArnaud Porterie
 
Chapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagiChapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagiNoha Nagi
 
How to make users fall in love with metadata in SharePoint
How to make users fall in love with metadata in SharePointHow to make users fall in love with metadata in SharePoint
How to make users fall in love with metadata in SharePointPeter Kettenis
 
Optimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and ScalabilityOptimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and ScalabilityJinho Choi
 
Domain Driven Design (DDD)
Domain Driven Design (DDD)Domain Driven Design (DDD)
Domain Driven Design (DDD)Tom Kocjan
 
Экосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектовЭкосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектовEvernote
 
Jeeves -natural language interface application
Jeeves -natural language interface applicationJeeves -natural language interface application
Jeeves -natural language interface applicationKaran Harsh Wardhan
 
VisibleThread for Docs 2.13 - What's New
VisibleThread for Docs 2.13 - What's NewVisibleThread for Docs 2.13 - What's New
VisibleThread for Docs 2.13 - What's NewVisibleThread
 
TERMINALFOUR t44u 2012 - Support and Extranet developments
TERMINALFOUR t44u 2012 - Support and Extranet developmentsTERMINALFOUR t44u 2012 - Support and Extranet developments
TERMINALFOUR t44u 2012 - Support and Extranet developmentsTerminalfour
 
How to build SharePoint applications that everybody loves
How to build SharePoint applications that everybody lovesHow to build SharePoint applications that everybody loves
How to build SharePoint applications that everybody lovesMaarten Visser
 
Editing: It's not as easy as it looks
Editing: It's not as easy as it looksEditing: It's not as easy as it looks
Editing: It's not as easy as it looksRhonda Bracey
 
The Open Source Way - @JBCNConf Closing Keynote 2016
The Open Source Way - @JBCNConf Closing Keynote 2016The Open Source Way - @JBCNConf Closing Keynote 2016
The Open Source Way - @JBCNConf Closing Keynote 2016Mauricio (Salaboy) Salatino
 
Evolution of the blended learning environment
Evolution of the blended learning environmentEvolution of the blended learning environment
Evolution of the blended learning environmentCOHERE2012
 
A study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven DevelopmentA study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven DevelopmentCarlos Solís
 
Domain Driven Design Ruby Ways - JURNAL 05/10/2017
Domain Driven Design Ruby Ways -  JURNAL 05/10/2017Domain Driven Design Ruby Ways -  JURNAL 05/10/2017
Domain Driven Design Ruby Ways - JURNAL 05/10/2017Jonathan Wylliem
 

Similaire à Gnome Workload Variation Study (20)

An Analytics Toolkit Tour
An Analytics Toolkit TourAn Analytics Toolkit Tour
An Analytics Toolkit Tour
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
 
An empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source CommunitiesAn empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source Communities
 
DockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operationsDockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operations
 
Chapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagiChapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagi
 
How to make users fall in love with metadata in SharePoint
How to make users fall in love with metadata in SharePointHow to make users fall in love with metadata in SharePoint
How to make users fall in love with metadata in SharePoint
 
Optimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and ScalabilityOptimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and Scalability
 
Domain Driven Design (DDD)
Domain Driven Design (DDD)Domain Driven Design (DDD)
Domain Driven Design (DDD)
 
Экосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектовЭкосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектов
 
Jeeves -natural language interface application
Jeeves -natural language interface applicationJeeves -natural language interface application
Jeeves -natural language interface application
 
The Upgrade Toolkit
The Upgrade ToolkitThe Upgrade Toolkit
The Upgrade Toolkit
 
VisibleThread for Docs 2.13 - What's New
VisibleThread for Docs 2.13 - What's NewVisibleThread for Docs 2.13 - What's New
VisibleThread for Docs 2.13 - What's New
 
TERMINALFOUR t44u 2012 - Support and Extranet developments
TERMINALFOUR t44u 2012 - Support and Extranet developmentsTERMINALFOUR t44u 2012 - Support and Extranet developments
TERMINALFOUR t44u 2012 - Support and Extranet developments
 
How to build SharePoint applications that everybody loves
How to build SharePoint applications that everybody lovesHow to build SharePoint applications that everybody loves
How to build SharePoint applications that everybody loves
 
Editing: It's not as easy as it looks
Editing: It's not as easy as it looksEditing: It's not as easy as it looks
Editing: It's not as easy as it looks
 
The Open Source Way - @JBCNConf Closing Keynote 2016
The Open Source Way - @JBCNConf Closing Keynote 2016The Open Source Way - @JBCNConf Closing Keynote 2016
The Open Source Way - @JBCNConf Closing Keynote 2016
 
Evolution of the blended learning environment
Evolution of the blended learning environmentEvolution of the blended learning environment
Evolution of the blended learning environment
 
A study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven DevelopmentA study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven Development
 
Agile Architecture
Agile ArchitectureAgile Architecture
Agile Architecture
 
Domain Driven Design Ruby Ways - JURNAL 05/10/2017
Domain Driven Design Ruby Ways -  JURNAL 05/10/2017Domain Driven Design Ruby Ways -  JURNAL 05/10/2017
Domain Driven Design Ruby Ways - JURNAL 05/10/2017
 

Dernier

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Dernier (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Gnome Workload Variation Study

  • 1. On the variation and specialisation of workload The Gnome case B.Vasilescu, A. Serebrenik, M. Goeminne, T. Mens mardi 4 décembre 2012
  • 2. Gnome as an ecosystem • Ecosystem: set of interconnected projects • ~ 1400 projects • ~ 3000 contributors • 15 years of activity Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 3. How does workload vary across contributors? • Who are they? • What do they do? • How do they do it? A partial answer by analysing the git repositories. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 4. Who are the contributors? Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 5. Identity matching • Contributors have an account per project repository… • … and sometimes more than one. • No explicit links between the accounts, need to guess them. • Based on names and e-mails found in the git repositories. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 6. Identity matching (cont.) • (semi) automatic classification techniques. • Must take into account variations, abbreviations, permutations, misspelling, nicknames, etc. • No perfect process: even a manualy post-checked result can contain false positives and false negatives. • Since Gnome has no strict identification regulation on the whole, some matches are not detectable without an extra context information. Fictitious example: • Robbie Williams <robbiew@gnome.org> • Euphegenia Doubtfire <euphegenia@gmail.com> Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 7. What do the contributors do? Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 8. 13 activity types • Identified by the path, name and extension of the touched files. • Coding : *.c, *.java, etc. • Translation : *.po, etc. • Testing : */test/*, etc. • ... Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 9. How do the contributors contribute? Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 10. Metrics • APTW(p,c,t) : Number of files touched by the contributor c performing an activity of type t in a project p. • Derived metrics, by aggregation: max, sum, etc. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 11. Workload 600 500 • 50% contributors Number of authors 400 made < 14 changes. 300 • 1 contributor made 200 185,874 changes. 100 0 0 2 4 6 8 10 12 log(AW) Université de Mons Rapport de formation doctorale 2011 Mathieu Goeminne mardi 4 décembre 2012
  • 12. The more things you do, the more things you can! • Correlations • Between the number of activity types and the workload. • Between the number of projects and the workload. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 13. Favorite activities of contributors having ≥ 14 changes • Most frequent contributors specialise in coding and development documentation. • The other activities are not subject to specialisation. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 14. Favorite activities of contributors having < 14 changes • Most occasional contributors specialise in translation and coding. • The other activities are not subject to specialisation. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 15. How strongly do the contributor’s focus? • Basic measure : RATW(c,t) • % of the total workload of c dedicated to t. • Use of Gini as inequality index: • Value in [0, 1[ • 0 if the workload is equally distributed. • Close to 1 if the workload is concentrated in few activity types. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 16. Contributor’s focus (cont.) • Occasional contributors typically participate in a single activity type. • Frequent contributors typically participate in few activity types. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 17. To summarise Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 18. What did we learn? • Most contributors are occasional and are involved in only one activity type; few are very active; frequent contributors are involved in few activity types. • The more things you do, the more things you can. • Occasional contributors are translators, involved in many projects. Frequent contributors are coders and are involved in few projects. • And more again in our paper. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 19. How did we do it? • Contributor matching: semi-automatic and automatic methods. • Activity identification based on file path/name/extension rules. • Advanced statistical analysis (among others for the partial ordering of activity types). • Specialisation: aggregation with inequality indices. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 20. In the future • Add a temporal aspect: How does the contributors’ behaviour change over time? • Consider subsets of Gnome: subecosystems composed by projects sharing stronger properties than all projects on average: archived, by theme, etc. • Combine both by studying migration trends. •… Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 21. Thank you On the variation and specialisation of workload – A case study of the Gnome ecosystem community B. Vasilescu, A. Serebrenik, M. Goeminne, T. Mens Empirical Software Engineering Waiting for being accepted Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012