Unleash Your Potential - Namagunga Girls Coding Club
Evolution of the mashup ecosystem by copying
1. Evolution of the mashup ecosystem
by copying
Michael Weiss Technology Innovation
Management (TIM)
Solange Sari
www.carleton.ca/tim
weiss@sce.carleton.ca Mashups 2010 1
2. Objective
• Mashups are applications that combine data and
services provided through APIs with user data
• New application development model: opportunistic
programming, uses a bricolage approach
• Creation of mashups supported by an ecosystem of
data providers, mashup platforms, and users
• Research questions
– How do mashup developers select APIs?
– How do mashup developers learn to develop mashups?
weiss@sce.carleton.ca Mashups 2010 2
3. Relevance
• Users/platforms: can benefit from/offer tools that
better support the way users work
• Directory providers: their role is to facilitate the
selection of APIs and learning of developers
• Data providers: need to understand which APIs their
APIs are used together most (interoperability)
weiss@sce.carleton.ca Mashups 2010 3
4. Previous work
• Examined structure and growth of mashup
ecosystem using visualization and network analysis
to identify members and their relationships
• Opportunistic programming studies how developers
use online resources in problem solving
• Research on innovation: (re)combination shortens
learning curve, modularity allows mix-and-match
• Models of network growth: preferential attachment
• Copying and duplication mechanisms in describing
the growth of the web and biological networks
weiss@sce.carleton.ca Mashups 2010 4
5. Hypothesis
• As answer to research questions, we examine to
what degree developers create mashups by copying
other mashups: copy of the mashup “blueprint”
Number of copies/mashup
Not copied
Snapshot on 08/16/10
Amazon/GoogleMaps/YouTube
ProgrammableWeb
5e-01
GoogleMaps/Twitter Mashups 4983 100%
Cumulative probability
5e-02
Flickr/GoogleMaps Not copied 1528 31%
Amazon/
Flickr
Blueprints 341 7%
5e-03
GoogleMaps
Copies of 3114 62%
GoogleMaps/YouTube
GoogleMaps
blueprints
5e-04
YouTube
1 5 10 50 100 500 1000
Number of copies
weiss@sce.carleton.ca Mashups 2010 5
6. Copying model
• Mashup ecosystem as network of mashups and APIs:
a link indicates that a mashup uses an API
• Assumption: mashups all have m APIs
• Initialize network:
– Create m0 ≥ m APIs, one mashup
• Grow network from t=m0 + 1 to t=N:
– Add new API with probability p
– With probability 1-p, choose a mashup as a template
– For each API in template, copy the API with probability α, or
choose a new API at random with probability 1-α
weiss@sce.carleton.ca Mashups 2010 6
7. Example
• Initial network: APIs 1 and 2, mashup 3
• Thin solid lines indicate random selection
1
3
t API
2 t Mashup
weiss@sce.carleton.ca Mashups 2010 7
8. Example
• Growth: add a new mashup (4)
• Thick solid lines indicate “copies” relationship
• Thin dashed lines indicate copying
1
3 4 Full copy
t API
2 t Mashup
weiss@sce.carleton.ca Mashups 2010 8
9. Example
• Growth: add a new API (5)
5
1
3 4 Full copy
t API
2 t Mashup
weiss@sce.carleton.ca Mashups 2010 9
10. Example
• Growth: add a new mashup (6)
• Thin solid lines indicate random selection
5
1
6 Partial copy
3 4 Full copy
t API
2 t Mashup
weiss@sce.carleton.ca Mashups 2010 10
11. Research method
• Calibrate simulation parameters
– N: combined actual number of APIs and mashups
– m = 2: good approximation of average actual APIs / mashup
– p: number of APIs / N (all as of 08/16/10)
• Simulate mashup ecosystem evolution
– Vary α over range 0.0 to 1.0, keep m = 2 fixed
– Run each simulation multiple times and terminate when 95%
confidence interval is sufficient for the optimization
• Determine best fit of simulated distribution of
mashups / API with actual using two fitting methods:
sum of squared error fit, and power law fit
weiss@sce.carleton.ca Mashups 2010 11
12. Actual distribution
• Distribution of mashups / API follows Zipf’s law:
plotting frequency of mashups relative to rank results
in a line with slope close to -1 in a log-log plot
GoogleMaps Actual
Flickr
Twitter
500
YouTube
Number of mashups
100
-0.990
50
10
5
1
1 2 5 10 20 50 100 200 500
Rank
weiss@sce.carleton.ca Mashups 2010 12
13. Sum of squared error fit
• Underestimates contribution of top-ranked API
• Overestimates the number of APIs used by at least
one mashup by 45% (1020 vs 703)
Actual
α = 0.798
Simulated (sum of squared error)
1e+07
500
8e+06
Sum of squared error
Number of mashups
100
6e+06
50
4e+06
10
5
2e+06
1
0.2 0.4 0.6 0.8 1 2 5 10 20 50 100 200 500
Copying factor (!) Rank
weiss@sce.carleton.ca Mashups 2010 13
14. Power law fit
• Slightly overestimates contribution of top API
• Overestimates the number of APIs used by at least
one mashup by 22% (859 vs 703)
2.5
Actual
α = 0.855
Simulated (power law)
2.0
500
Power law coefficient error
1.5
Number of mashups
100
50
1.0
10
0.5
5
0.0
1
0.2 0.4 0.6 0.8 1 2 5 10 20 50 100 200 500
Copying factor (!) Rank
weiss@sce.carleton.ca Mashups 2010 14
15. Cumulative contribution of APIs
• Sum of squared error fit underestimates number of
APIs that contributed to 50% of API uses
• Power law fit overestimates number of APIs that
contributed to 50% of API uses
Cumulative contribution
1.0
0.8
0.6
0.4
0.2
1 2 5 10 20 50 100 200 500
Rank
weiss@sce.carleton.ca Mashups 2010 15
16. Discussion
• Both methods obtained their best fit for a high
copying factor: this suggests that most mashups are
created by modifying the an existing blueprint
• Power law fit more closely approximates actual Zipf
distribution, however, sum of squared error fit offers a
better match of actual degrees of APIs in midrange
weiss@sce.carleton.ca Mashups 2010 16
17. Insights for stakeholders
• Confirmation of practices directories follow:
– List combinations of APIs into mashups
– Keep track of developers of mashups
– Provide tutorials on mashup development
• Directory providers should make blueprints more
apparent: also list frequency of blueprints
• Users benefit as they can look at blueprints to select
APIs that work well together and as examples
• API providers learn which other APIs are frequently
combined with their API: incentive to interoperate
weiss@sce.carleton.ca Mashups 2010 17
18. Conclusion
• Results indicate that copying plays a significant role
in the evolution of the mashup ecosystem
• However, we cannot rule out other factors that could
explain how mashup ecosystem grows
• Copying hypothesis in line with current thinking about
innovation: eg MacArthur’s Nature of Technology
• Other current and future work:
– Extend simulation to include mashups of different size
– Test copying hypothesis empirically: we currently examine
hereditary relationships between mashups
– Examine link between copying and diversity of ecosystem
weiss@sce.carleton.ca Mashups 2010 18