Open Development Analytics consists in publishing detailed and up-to-date analytics about the processes and community behind a project.
Providing this information in the open is a step beyond in
transparency, contributing to improve the project itself, and helping third parties to make informed decisions. The talk will present Open Development Analytics in detail, and will explain why it is a next step towards more project transparency.
Open Development Analytics, a step beyond in project transparency
1. Open Development Analytics
A Step Towards More Project Transparency
(Reduced version)
Jesus M. Gonzalez-Barahona
jgb@bitergia.com @jgbarah http://speakerdeck.com/jgbarah
Bitergia / LibreSoft (URJC)
Open Source Summit
Paris (France), November 16th 2016
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 1 / 54
6. Structure of the presentation
1 A bit of context
2 Transparency and governance
3 Open development analytics
4 How are changes being reviewed?
5 Dependency
6 Dealing with issues?
7 Diversity
8 The end
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 6 / 54
7. A bit of context
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 7 / 54
8. Me and my two hats
Uni Rey Juan Carlos:
LibreSoft research team
Understanding free, open source software
Data analytics approach
Bitergia:
From research to the real world
Understanding software development
Data analytics approach
http://gsyc.es/~jgb
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 8 / 54
9. The company
The software development analytics company
dashboards
reports
consultancy
...
http://bitergia.com
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 9 / 54
11. Who drives open software developoment?
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 11 / 54
12. Who drives open software development
A community
Persons (and organizations) with
common goals
different interests
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 12 / 54
14. Self-awareness
Open development communities
need to be self-aware
data is the source for awareness...
when it can be used for “sensing”
The same applies
to any open organization
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 14 / 54
15. Governance
“Establishment of policies, and continuous
monitoring of their proper implementation, by the
members of the governing body of an
organization. It includes the mechanisms required
to balance the powers of the members (with the
associated accountability), and their primary duty
of enhancing the prosperity and viability of the
organization.”
http://businessdictionary.com
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 15 / 54
16. Governance
“Establishment of policies, and continuous
monitoring of their proper implementation, by
the members of the governing body of an
organization. It includes the mechanisms required
to balance the powers of the members (with the
associated accountability), and their primary
duty of enhancing the prosperity and viability of
the organization.”
http://businessdictionary.com
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 16 / 54
17. Transparency
It comes in two flavors
Transparency to the community
(fairness)
Transparency to third parties
(trust)
Which for open organizations are kind of the same
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 17 / 54
18. Transparency
Example of rationale (OpenStack):
“OpenStack favors disclosure and transparency to
promote sharing and collaboration within the
OpenStack community”
https://www.openstack.org/legal/transparency-policy/
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 18 / 54
19. Transparency: showing the data is not enough
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 19 / 54
21. A new dimension of openness
When we develop in the open
we produce a great deal of data
about how we develop
“Show me the development data”
as a step beyond
“show me the code”
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 21 / 54
22. From open development to open development analytics
Information about code, community, development
for open development projects
can be retrieved, organized, analyzed
Let’s publish analytics results & data
Open Development Analytics:
A new standard for transparency
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 22 / 54
23. Open development analytics
Who may benefit?
Developers
Project managers
Community managers
Evaluators
...
Anyone interested in the health of the project
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 23 / 54
24. Who may benefit?
Slide used by Jim Zemlin at LF Collab 2016
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 24 / 54
25. Some areas of interest
Performance (understanding activity)
Company participation (beyond copyright
notices)
Transparency (available information)
Auditing (certify participation, experience, etc.)
Profiling (key people, companies)
Neutrality (fair treatment)
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 25 / 54
26. How are changes being
reviewed?
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 26 / 54
27. Some reviewers are more equal than others
http://blog.bitergia.com/2015/12/30/
some-developers-are-more-equal-than-others/
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 27 / 54
28. Neutrality?
q
q
q
q q
q
q q
0
1
2
3
250 500 1000 2000 4000
Number of accepted reviews
Iterationsperacceptedreview(median)
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 28 / 54
30. Apache Pony Factor
In words of Daniel Gruno:
We [the ASF] created a term we have coined
“Pony Factor” (because ASF is full of ponies, or
people who think they are ponies). Pony Factor
(PF) shows the diversity of a project in terms of
the division of labor among committers in a
project.
Pony Factor is determined as:
“The lowest number of committers whose
total contribution constitutes the majority of
the codebase”
https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 30 / 54
32. Bitergia Elephant Factor
Projects can benefit from powerful collaborations
from companies (elephants). The elephant factor
shows the diversity of a project in terms of the
division of labor among companies (by mean of
developers affiliated with them).
Elephant factor is determined as:
“The lowest number of companies whose
total contribution (in commits by their
employees) constitutes the majority of the
commits”
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 32 / 54
33. Code “owned”
“The land belongs
to its workers”
Emiliano Zapata
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 33 / 54
34. Code “owned”
The code changes over time. The current version is
“owned” by the people who produced it.
The code “belongs” to those who wrote it.
Zapata factor (work in progress):
“The lowest number of developers for whom
the total number of lines of code they “own”
(were last touched by them) constitutes the
majority of the lines of code”
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 34 / 54
35. Diversity: Code “owned”
[Linux kernel, July 2016, Zapata factor: 200]
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 35 / 54
36. Code “owned”
The code “belongs” to companies who employ
developers changing it.
United Fruit factor (work in progress):
“The lowest number of companies for whom
the total number of lines of code they “own”
(were last touched by their employees)
constitutes the majority of the lines of code”
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 36 / 54
38. Dealing with issues?
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 38 / 54
39. Issues may be processed not as intended
Policy (or recommendations) may mandate transitions
but are they real?
Time to close when same company reporting / fixing?
Time to close for external bug reports?
Time to close depending on who reports?
Who opens tickets that nobody cares about?
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 39 / 54
40. Ej: The “mandated” changes of state
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 40 / 54
41. The real changes of state
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 41 / 54
43. Geography
Geographical diversity is difficult to assess
Companies can keep detailed records, but open
communties are different
Fortunately, some tools leave traces...
This allows for better knowledge
...and better tracking of initiatives
Example: policies to enlarge the number of developers
in XXX region
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 43 / 54
44. Geography: time zones in git records
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 44 / 54
46. Gender: Analyzing by name
Current situation of gender imbalance in OpenStack
Gender Developers Commmits Commits/devel
Female 750 14,647 19.5
Male 4,632 207,112 44.7
Only names with more than 80% of certainty.
[Work in progress, preliminary results]
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 46 / 54
47. Gender: Analyzing by name
Commits by women: 6.8% (4 Kcommits)
Women: 9.9% (330 developers)
Linux kernel, Nov 2015 – Oct 2016
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 47 / 54
49. Open Development Analytics Live: OPNFV dashboard
http://opnfv.biterg.io
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 49 / 54
50. Summary
Open Development Analytics
A step forward in project
transparency
http://grimoirelab.github.io
http://speakerdeck.com/jgbarah
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 50 / 54
51. A moment for a commercial: Join us at MSR 2017!!
http://2017.msrconf.org
14th International
Conference on
Mining Software
Repositories
Co-located with ICSE
Buenos Aires, Argentina
Save the dates:
May 20-21 2017
Start the conversation!!!
#msr17
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 51 / 54
52. License
c 2016 Bitergia
Some rights reserved.
This presentation is distributed under the
“Attribution-ShareAlike 3.0” license, by Creative Commons,
available at
http://creativecommons.org/licenses/by-sa/3.0/
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 52 / 54
53. Credits (1)
“Man With Two Hats”
Statue by Henk Visch, located in Otawa, Canada
Picture by Lezumbalaberenjena in Wikimedia Commons
License: Public domain
https://commons.wikimedia.org/wiki/File:
Man_With_Two_Hats_Ottawa_Statue_by_lezumbalaberenjena.jpg
“Napoleon’s Russian campaign of 1812”
Original by Charles Minard
License: Public domain
https://en.wikipedia.org/wiki/Charles_Joseph_Minard#/media/File:
Minard.png
“Aged Come In We’re Open”
Picture by Czarina Alegre in Flickr
License: Creative Commons Attribution 2.0
https://flic.kr/p/fjGamh
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 53 / 54
54. Credits (2)
“Good code”
Comic by Randall Munroe, XKCD 844
License: Creative Commons Attribution-NonCommercial 2.5
http://xkcd.com/844/
“Crowd at FOSDEM 2008”
Picture by Jes´us Corrius in Flickr
Licenses: Creative Commmons Attribution 2.0
http://www.flickr.com/photos/jcorrius/2302302707/
“Elephant”
Picture by ajoheyho
License: Creative Commons Public Domain
https://pixabay.com/en/elephant-african-bush-elephant-114543/
“Emiliano Zapata”
License: Public Domain
Jesus Gonzalez-Barahona (Bitergia) Open Development Analytics Paris, Nov 2016 54 / 54