How temporal network analysis can help us to explore existing interrelationships in online production systems
1. How temporal network analysis can help us to explore
existing interrelationships in online production systems
Dr. Claudia Müller-Birn
Institute for Computer Science, Group Networked Information Systems
January 20, 2011
Invited Talk, GESIS, Bonn
2. When you think of the Social Web...
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 2
Claudia Müller-Birn
3. Social participation creates digital products
STEM (Spatio-Temporal
Exploratory Model) map Can Distributed Volunteers Accomplish
of Dickcissel (http:// Massive Data Analysis Tasks?
ebird.org) (Kanefsky et al., 2001)
Graph of source
lines of code
added [millions]
(Deshpande &
Riehle, 2008)
dataset based on
www.ohloh.net
Number of articles on English-language
Wikipedia from its creation in 2001
through June 2010 (Riedl, 2011)
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 3
Claudia Müller-Birn
4. Social participation creates digital products
• Geographically distributed communities
STEM (Spatio-Temporal
Exploratory Model) map Can Distributed Volunteers Accomplish
• Very large number of granular, individual contributions Tasks?
of Dickcissel (http://
Massive Data Analysis
ebird.org)
(Kanefsky et al., 2001)
• Openness of boundaries, technical standards,
communication and information sources
• Peering as a new form of horizontal organization
• Sharing of intellectual property
Graph of source
(Benkler, 2006), (OMahony, 2007), (Tapscott2007)
lines of code
added [millions]
(Deshpande &
Riehle, 2008)
dataset based on
www.ohloh.net
Number of articles on English-language
Wikipedia from its creation in 2001
through June 2010 (Riedl, 2011)
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 3
Claudia Müller-Birn
5. Outline
• Dimensions in online production systems and existing research issues
• Success in online production systems
• Mirroring hypothesis in online production systems
(research in progress)
• Recent and future research challenges
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 4
Claudia Müller-Birn
6. Dimensions in online production systems
pooled structured integral
product product product
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 5
Claudia Müller-Birn
7. Selected research issues in online production systems
MODELING QUALITY/SUCCESS
• How do we model the • How do we measure quality or
dimensions of online success?
production systems? • How do online production
• Which network systems strive for quality?
descriptions are
especially useful?
• What are appropriate
data sources?
EVOLUTION
• How do the social and
the technical dimension INFLUENCE
co-evolve? • How do we measure the
• What techniques can be influence of the technical
used for measuring and dimension on the social dimension
describing evolution? and vice versa?
• Are specific structures of networks
more influential than others?
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 6
Claudia Müller-Birn
8. Success in online production systems:
A longitudinal analysis of the socio-technical duality of
development projects*
Müller-Birn, C., Cataldo, M., Wagstrom, P., Herbsleb, J.D.: Success in Online Production
Systems: A Longitudinal Analysis of the Socio-Technical Duality of Development Projects.
Technical Report CMU-ISR-10-129, 2010.
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 7
Claudia Müller-Birn
9. What might be success factors for OPSs?
Success of virtual community sites (Preece, 2000):
Usability: human-technology interactions (e.g., information design, navigation, and access)
Sociability: human-human interactions by developing policies and practices that are socially
acceptable and practicable
Success drivers are number of In Wikipedia the success of an article can
participants who communicate, the be seen as its quality (Kittur & Kraut,
number of exchanged messages, 2008) (there are certain requirements in
interactivity, and reciprocity (Preece, order to get assigned into a six-level
2001) quality system, ranging from
“stub” (almost no content) to “featured-
article” (best quality))
In product development,
conceptualizations such as market
performance of the product,
project cycle time, efficiency of In open source projects typically
the development process and quantifications of volume related to number
product quality are used (Clark & of contributors or participants or
Fujimoto, 1990), (Eisenhardt & number of access to the particular
Tabrizi, 1995), (Sethi, 2000) project’s product or outcome (Crowston et
al., 2006), (Iriberri & Leroy, 2009) is used
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 8
Claudia Müller-Birn
10. Open source software (OSS) project GNOME
• Graphical user interface and a development framework for desktop
applications
• GNOME is a large collection of libraries and applications rather than a
monolith application (German, 2003)
• Data covered a period of about 8 years of
activity from November 1997 until July 2005 Description Value
Mail repository
Number of emails 467,639
Number of senders 34,662
Date of first email 01-01-1997
Date of last email 02-10-2007
Code repository
Number of committer 1,312
Number of commits 479,678
Number of files 286,314
Number of commits (files) 2,456,302
Date of first commit 12-22-1996
Date of last commit 08-01-2005
Bug repository
Number of users 2,706
Number of bugs 201,068
Date of first bug 01-01-1999
Date of last bug 11-18-2005
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 9
Claudia Müller-Birn
11. How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 10
Claudia Müller-Birn
12. Used data set
• Community hosted over 700 different projects
• Projects differ significantly in their development activity, size, and
participation rate
• Projects were included if they satisfy all of the following criteria
- Continuity of development activity (at least one year)
- Amount of development activity (at least 100 commits)
- Attractiveness of project for developers (at least 10 committers),
- User interest to participate (at least one community hosted mailing list)
- Data collected from different repositories should overlap during the
analyzed period
• Further used data set consists of 27 projects
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 11
Claudia Müller-Birn
13. Social Dimension
• Coordination needs network
- Computation of coordination needs networks for each project by computing
(Task Assignment ∗ Task Dependency) ∗ Transpose(Task Assignment)
(Cataldo et al., 2008)
- Task assignment: which individuals are working on which tasks
- Task dependency: relationships or dependencies among tasks
• Communication network
- Construction of a collection of communication networks for each project
from the project’s mailing list
- Construction of communication networks of the whole OPS by aggregating
the project-level communication networks into one
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 12
Claudia Müller-Birn
14. Technical Dimension
• Syntactic Dependency Network
- Examination of source code and extracting data-related dependency (e.g.,
a particular data structure modified by a function and used in another
function) and functional dependency (e.g., method A calls method B)
relationships between source code files during the period of time between
two releases of the GNOME distribution
• Logical Dependency Network
- Construction of the logical dependencies network by extracting the set of
source code files that were modified as part of development tasks
performed during the period of time between two releases of the GNOME
distribution
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 13
Claudia Müller-Birn
15. Results
• Successful projects benefit from interaction patterns that are able to
disseminate information to most of the project participants
while minimizing redundant interconnections
• Successful projects exhibit a continuously active core group that
is able to integrate all member of the project or the developed
software
• Project success depends on its members occupying different
structural positions within the network as a mechanism to balance
the benefits and limitations of belonging solely to the core or the
periphery
• When tasks dependencies are partitioned among separate
clusters of highly interdependent sets of individuals, projects are
more likely to succeed
• Modular technical structures (those with independent clusters of
highly interdependent parts) are an important success driver for
online production systems
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 14
Claudia Müller-Birn
16. Mirroring hypothesis in online production systems
using temporal network analysis (research in progress)
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 15
Claudia Müller-Birn
17. Co-evolution of social and technical architectures
• Social architecture should reflect the technical architecture of a
system and vice versa in order to improve the degree of innovation
or to reduce the coordination needs
(Conway, 1968), (Baldwin & Clark, 2000), (Cataldo et al., 2008)
• Open collaborative communities are geographically distributed;
therefore, their technical architecture should be modular (e.g., (Moon &
Sproull, 2000))
• In the context of OSS, a modular technical architecture increases
incentives to join and decreases free riding (Baldwin & Clark, 2006), (West &
Mahony, 2008)
• BUT recent empirical work has shown that this hypothesis can only
be partly supported in open collaborative settings (Colfer & Baldwin 2010)
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 16
Claudia Müller-Birn
18. Requirements for model description
• Networks are used to describe communities therefore the relation
between the people (density of links) should be used as measure
• Evolution of networks over time; therefore, a temporal model is
required
• Large membership base in open collaborative communities
therefore the algorithm should be able to deal with large networks
• Complete knowledge about the networks is often not available
therefore the algorithm should detect local communities
• People are often actively involved in different communities;
therefore, the algorithm should allow overlapping communities
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 17
Claudia Müller-Birn
19. Brief overview on existing approaches
• Discrete approach to consider time in graphs (Moody, 2005)
- Cross-sectional analysis of graphs where the main focus lies on the changes
of network stages (e.g., (Cortes, 2003), (Sun, 2007))
- Approaches to discretize the interactions (a) the cumulative approach and
(b) the time window approach
• Continuous approach to consider time in graphs (Moody, 2005)
- Each single interaction with a start and end date is considered (e.g.,
(Kumar, 2003r), (Priebe, 2005))
• Describing evolution in networks based on a group-level
- Network quality function (Mucha et al. 2010)
- Dynamic tensor analysis (Sun et al. 2006)
- Evolutionary spectral clustering (Chi et al. 2007)
- Clique percolation method (Palla et al., 2005)
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 18
Claudia Müller-Birn
20. Experimental setup
• Data set: OOS project Epiphany (web browser)
• Communication network based on mailing list repository
• One time frame considers three months of activity
• Steps of CPM Description Value
- Locate all complete subgraphs, i.e. cliques, # month 44
that are not part of a larger subgraph # senders 688
# mails 8,352
- Identify communities based on # threads 1,294
clique-clique overlap matrix # committers 208
- Specify “optimal” percolation structure # commits 5,898
# files 21,223
# added LOC 957,091
# removed LOC 748,956
mails per person 12.00
persons per thread 6.45
commits per person 28.36
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 19
Claudia Müller-Birn
21. Selected community and network characteristics
10,000 0.003 9
edges
8
nodes 0.0025
1,000 7
!"#$%&'()'!(*%+,%*-%+'
0.002 6
!"#$%&%'(
5
!"#$%&'(
100 0.0015
4
0.001 3
10
2
0.0005
1
1 0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
$#)*$+,&(
+!./+0(1' )*#+),-&(
90.00% 90
k=3 not included k=3
80.00% 80
k=4 not included k=4
k=5 not included
percentage of non-included nodes
70.00% 70 k=5
!"#$%&'%()*+$!,%-&../0",1%
60.00% 60
50.00% 50
40.00% 40
30.00% 30
20.00% 20
10.00% 10
0.00% 0
!" #" $" %" &" '" (" )" *" !+" 1 2 3 4 5 6 7 8 9 10
snapshot !0)2!3&,%
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 20
Claudia Müller-Birn
22. Community development based on social interactions
90
new (leaving)
80 new
old (leaving)
70 old
60
50
size
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10
snapshot
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 21
Claudia Müller-Birn
23. Recent and future research challenges
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 22
Claudia Müller-Birn
24. Conclusions
• Considering time by describing the two dimensions helps to reveal
dependencies between development patterns and the specific life
cycle stage of an OPS
• Success of an online production system is related to the social AND
technical dimension; thus, describing both dimensions is a
requirement to understand and to improve existing production
processes
• Other research has shown that organizational and technical
structures are related; necessity to explore existing
interdependencies in OPSs
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 23
Claudia Müller-Birn
25. Thank you.
Acknowledgements
Co-authors: Marcelo Cataldo, James D. Herbsleb
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 24
Claudia Müller-Birn
26. References
• C.Y. Baldwin and K.B. Clark: Design Rules: The Power of Modularity Volume 1. MIT Press, Cambridge, MA, USA, 1999.
• C.Y. Baldwin and K.B. Clark. The Architecture of Participation: Does Code Architecture Mitigate Free Riding in the Open
Source Development Model? Management Science. 52:7. 2006.
• Benkler, Y., & Nissenbaum, H. Commons based Peer Production and Virtue*. Journal of Political Philosophy, 14(4): 394-419.
2006.
• M. Cataldo, J.D. Herbsleb, K.M. Carley. Socio-technical congruence: a framework for assessing the impact of technical and
work dependencies on software development productivity, Proceedings of the Second ACM-IEEE international symposium on
Empirical software engineering and measurement: 2-11. Kaiserslautern, Germany: ACM. 2008.
• Y. Chi, S. Zhu, X. Song, J. Tatemura and B.L. Tseng. Structural and temporal analysis of the blogosphere through
community factorization. KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery
and data mining. ACM. San Jose, California, USA, 163-172. 2007.
• K. Clark and T. Fujimoto. Product Development Performance. Harvard Business School Press, 1991.
• M. E. Conway. How do Committees Invent? Datamation. 14:4. 28-31. 1968.
• C. Cortes, D. Pregibon and C. Volinsky: Computational Methods for Dynamic Graphs. Journal of Computational and
Graphical Statistics. 12:4. 950-970. 2003.
• K. Crowston, J. Howison, H. Annabi, H. Information systems success in free and open source software development: theory
and measures. Software Process: Improvement and Practice, 11(2): 123-148. 2006.
• A. Deshpande and D. Riehle: The Total Growth of Open Source. Proceedings of the Fourth Conference on Open Source
Systems (OSS 2008). Springer Verlag. 197-209. 2008.
• K. Eisenhardt and B. Tabrizi. Accelerating adaptive processes: Product innovation in the global industry. Administrative
Science Quarterly, 40(1):84–110, 1995.
• A. Iriberri and G. Leroy. A life-cycle perspective on online community success. ACM Comput. Surv., 41(2):1–29, 2009.
• A. Kittur and R. E. Kraut. Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proc. of CSCW,
pages 37–46, 2008.
• B. Kanefsky, N.G. Barlow, V.C. Gulick. Can Distributed Volunteers Accomplish Massive Data Analysis Tasks?. 32nd Annual
Lunar and Planetary Science Conference. 2001.
• R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. WWW '03: Proceedings of the 12th
international conference on World Wide Web. ACM, New York, NY, USA. 568--576. 2003.
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 25
Claudia Müller-Birn
27. References (cont.)
• J. Moody, D. McFarland and S. Bender-deMoll. Dynamic Network Visualization. American Journal of Sociology. 110:4.
1206-1241. 2005.
• J.Y. Moon and L. Sproull. Essence of Distributed Work: The Case of the Linux Kernel. First Monday. 5:11. 2000.
• L. Sproull and S. Kiesler. Connections - new ways of working in the networked organization. MIT Press. Cambridge, Mass.
1995.
• P.J. Mucha, T. Richardson, K. Macon, M.A. Porter, J-P. Onnela: Community Structure in Time-Dependent, Multiscale, and
Multiplex Networks. Science. 328: 5980. 876-878. 2010.
• C. Müller-Birn, M. Cataldo, P. Wagstrom, J.D. Herbsleb: Success in Online Production Systems: A Longitudinal Analysis of
the Socio-Technical Duality of Development Projects. Technical Report CMU-ISR-10-129, 2010.
• O'Mahoney, S., & Ferraro, F. The emergence of governance in an open source community. Academy of Management Journal,
50(5): 1079-1106. 2007.
• G. Palla, I. Dereny, I. Farkas, I, T. Vicsek. Uncovering the overlapping community structure of complex networks in nature
and society. Nature. 435: 7043. 814-818. 2005.
• J. Preece. Online Communities: Designing Usability, Supporting Sociability. John Wiley & Son, 2000.
• J. Preece. Sociability and usability in online communities: determining and measuring success. Behav. & Inform. Techn., 20
(5):347–356, 2001.
• C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park. Scan Statistics on Enron Graphs. Computational and Mathematical
Organization Theory Journal. 11:3. 229-247. 2005
• J. Riedl. The Promise and Peril of Social Computing. Computer. 44:1. 93-95. 2011.
• R. Sethi. New product quality and product development teams. Journal of Marketing, 64:1–14, 2000.
• J. Sun, D. Tao and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. KDD '06: Proceedings of the 12th
ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. New York, NY, USA. 374-383. 2006.
• J. Sun. Incremental pattern discovery on streams, graphs and tensors (phdthesis). CMU. Pittsburgh, PA, USA. 2007.
• D. Tapscott, A. Williams. Wikinomics: How mass collaboration changes everything: Portfolio Trade. 2008.
• J. West and S. O'Mahony. The Role of Participation Architecture in Growing Sponsored Open Source Communities. Industry
and Innovation. 15:2. 145-168. 2008.
How temporal network analysis can help us to explore existing interrelationships in online production systems. January 20, 2011 26
Claudia Müller-Birn