1. Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Managing Social Communities
Steffen Staab
Acknowledgements to ROBUST Project team & WEST Team,
in particular
K. Dellschaft, J. Kunegis, F. Schwagereit
2. Institut WeST – Web Science & Technologies
Semantic Web Web Retrieval Interactive Web Multimedia Web Software Web
eGovernment eMedia eScience eOrganizations ePerson
Institute for Computer Institute for Leibniz Institute for
Science Information Systems Social Sciences (GESIS)
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 2
3. Plan for this Talk
1 Web
2 Science
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 3
4. Social Communities
…are everywhere
c
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 4
5. Risks Opportunities
Bad content quality, Open innovation,
social ill behavior,… improved user support,…
jeopardize business value increase business value
Data Storage Content, User &
and Processing Networks Analysis
Scalability, heterogeneity Understanding,
response time
Business Value
Product support & innovation, CRM, Expertise management, Marketing, Advertising
Online Communities Intranet, Extranet, Internet
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 5
6. Large-scale Testbeds
2013 2013
5M users millions posts/day
1200K accesses/day 1TB data/day
SAP (B2B) Polecat (C2C)
Community Network
Online Marketing
Business Partner Network
CRM for IT
2009
2009
1.5M users
…
150K access/day
IBM (E2E)
Developer Network
2009 2013
Corporate Knowledge
99K accounts 800K accounts
Management
Steffen Staab Web Science Doctoral
2 staab@uni-koblenz.de Summer School 6
7. SAP Business Partner Use Case
SAP Developer Network
Size of user generated
Posts per day Number of users
content (posts)
2007 2009 2013 2007 2009 2013 2007 2009 2013
SAP 5000 6000 7000 1M 4M 10.0 1M 1.7M 4.8M
M
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 7
8. ROBUST: IBM Employee Use Case
Business Data Created per day Number of users
2007 2009 2013 2007 2009 2013
IBM Activities Entry 700 2750 5000 53200 143600 200000
IBM Blogs Entries 120 30 60 34600 77750 100000
IBM Communities 3 23 50 3000 181950 250000
IBM Bookmarks 800 900 1000 8500 22400 50000
IBM Wikis NA 40 100 NA 35450 100000
IBM Files NA 290 1000 NA 45160 100000
IBM Overall 1623 4033 7210 500000* 500000* 500000*
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 8
9. Risks in Online Communities
Definition: Risk Likelihood
Probability of an event occurring
Impact of the event occurring
Risk management Cost Benefit
Process for managing costs, benefits and likelyhoods
Detect high impact risks in time even if
they generate expensive false alarms SAP: SCN Award Points Scamming
Ignore very low impact risks • Experts reputation decreases
even if they can be reliably detected • Business users leave the forum
Types of risks
Non-compliance with the community policies/polity
Scamming or spamming behavior
Lower involvement and productivity
Decrease of user satisfaction
Loss of community dynamics
Web: Public communities
• Death of TechCrunch forum due to
Loss of 1% experts loss of high revenue
spam and lack of management
Loss of 10% lurkers low impact
Steffen Staab Web Science Doctoral
8 staab@uni-koblenz.de Summer School 9
10. Communities: dynamics and confidentiality
ROBUST supports decision making for users, hosts and service providers
Managing growth & decline
Identify, encourage, safeguard core users
Social matching
Define/maintain etiquette and policies
Manage negative behavior and conflicts
Content matching
Recognize, categorize decline and growth
Redirect users to other communities
Merging communities
Cross community topic detection to stimulate inter-community interactions
Splitting communities
Identification of clusters/compartments of members that can be separate
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 10
11. Agenda
• Risks and Opportunities in Social Communities:
the ROBUST project
• Many related Talks in this Summer School
Robust partners Closely related
Alani: Monitoring and analysis Greene: Network Analysis
of social networks Bernstein: Scalable
Karnstedt: User churn infrastructures
But here comes the biased account from work in our institute
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 11
12. Plan for this Talk
1 Web
2 Science
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 12
13. Bild eines schwarzen Lochs
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Flickr cc, Jan 7 2009 by
Summer School 13 thebadastronomer
14. Agenda
• Risks and Opportunities in Social Communities:
the ROBUST project
• Web Science Methodology:
An explanation by analogy with Physics
and some initial (!) applications to online communities
• Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising
from individual behavior (micro level)
• Predicting dynamic system behavior,
recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level
• Controling dynamic system behavior by collective action
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 14
15. Better understanding of the tagging process
Cooperative classification of resources
Which factors influence the tagging process?
• Background knowledge of the user?
• Tag assignments of other users?
Hypothesis: Tagging involves imitation of other users AND
selection of tags from background knowledge of users.
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 15
16. Methodology
User interface Something else?
Tagging
Conceptualization Behavior
Comparison
of Statistics
Own Shared
Knowledge terminology
Model of User Interface Influence
Simulated
Joint Stochastic Model Tagging
Behavior
Model of Own Model of
Knowledge Sharing
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 16
17. Components of Analysis
Properties of Tag Streams
Observations
Stream view of Folksonomies in
Co-occurrence streams the real world
Resource streams
Dynamic model for Tagging Systems Stochastic
Simulating background knowledge models of
Simulating tag imitation influence
Simulation Results Which models
Co-occurrence streams best fit the
Resource streams reality?
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 17
18. Stream Views of a Folksonomy
Folksonomies:
Vertices: Users, tags, resources
Edges: Tag assignments
Postings:
• Tag assignments of a user to a single resource
• Can be ordered according to their time-stamp
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 18
19. Co-occurrence Streams
Co-occurrence Streams:
All tags co-occurring with a given tag in a posting
Ordered by posting time
Co-occurrence stream for 'apple':
{mackz, r1, {apple, tree}, 13:25}
{klaasd, r2, {apple, mac, ibook}, 13:26}
{mackz, r2, {apple, macintosh, stevejobs}, 13:27}
tree, mac, ibook, macintosh, stevejobs
Tag |Y| |U| |T| |R|
ajax 2.949.614 88.526 41.898 71.525
blog 6.098.471 158.578 186.043 557.017
xml 974.866 44.326 31.998 61.843
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 19
20. Properties of Co-occurrence Streams – Tag Growth
linear
growth
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 20
21. Properties of Co-occurrence Streams – Tag Frequencies
power law
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 21
22. Resource Streams
Resource Streams:
All tags assigned to a resource
Ordered by posting time
Resource stream for 'r2':
{mackz, r1, {apple, tree}, 13:25}
{klaasd, r2, {apple, mac, ibook}, 13:26}
{mackz, r2, {apple, macintosh, stevejobs}, 13:27}
apple, mac, ibook, apple, macintosh, stevejobs
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 22
23. Properties of Resource Streams – Tag Frequencies
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 23
24. Properties of Resource Streams – Tag Frequencies
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 24
25. Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Simulating the Evolution of Tag Streams
26. Simulating tag streams
Which of my concepts
Inspiration for conceptualization from:
represent this web
page? How do I tag 1. Most popular tags
this web page?
2. Most recently used tags
3. Tags used for this resource
4. Tags co-occuring with similar text
documents
5. Creating completely new tags
6. …
Which combination of
inspirations develop the
same statistics as the
one observed for
delicious?
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 26
27. The Delicious User Interface
Imitating previous tag assignments:
Recommended tags: Intersection of tags of a user and tags already
assigned to the resource.
Your tags: Tags of the user.
Popular tags: 7 most popular tags assigned to the resource.
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 27
28. Simulating a Tag Stream
Start with empty tag stream
Each simulation step appends a new tag assignment
Simulation of a single tag assignment:
p(w|t): Probability of selecting word w for topic t.
Modeled by word distributions in a topic centered
text corpus.
n: Number of visible previous tags.
h: Maximal number of previous tag assignments
used for determining ranking of the n distinct tags.
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 28
29. Modeling Background Knowledge
Text Corpora Del.icio.us
Text Corpora
PBK: Probability of selecting from background knowledge
p(w|t): Probability of selecting word w for topic t. Modeled by word
distributions in a topic centered text corpus.
p(w|r): Probability of selecting word w for resource r.
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 29
30. Modeling Tag Imitation
PBK t t-1 t-2 t-3 t-4 t-5 … t-h …
1-PBK
1 2 3 … n
PI = 1 – PBK: Probability of imitating a previous tag assignment
n: Number of visible top-ranked tags
h: Maximal number of previous tag assignments used for determining
ranking of the n distinct tags
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 30
31. Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Simulation Results
32. Overall Scheme
User interface Something else?
Tagging
Conceptualization Behavior
Comparison
of Statistics
Own Shared
Knowledge terminology
Model of User Interface Influence
Simulated
Joint Stochastic Model Tagging
Behavior
Model of Own Knowledge Model of Sharing
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 32
33. Simulating Co-occurrence Streams
Tag growth:
Influenced by PBK and p(w|t)
Tag Frequencies:
Influenced by PBK, p(w|t), n, h
n: Semantic breadth of a topic (blog: 100 tags,
ajax: 50 tags, xml: 50 tags; Cattuto et al. 2007)
h: No hint for realistic values. Good guesses may be 500
and 1000.
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 33
34. Co-occ. Streams – Simulated Tag Growth
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 34
35. Co-occ. Stream – Simulated Tag Frequencies
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 36
36. Simulating Resource Streams
PI and PBK: Values comparable to co-occurrence streams
p(w|r): Approximated by p(w|t)
n: 7 tags are visible (cf. Delicious user interface)
h: Smaller value than for co-occurrence streams
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 37
37. Res. Streams – Simulated Tag Frequencies
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 38
38. Lessons learned [Dellschaft+Staab,
ACM Hypertext 2008]
Black holes do not only eat mass they also dissolve by
emitting radiation
Imitation AND background knowledge are needed for
explaining properties of tag streams
Probability of imitating previous tag assignments: ~70-90%
Frequency Rank
Co-occur. Streams Resource Streams Tag Growth
Polya Urn Model o o fixed size
Simon Model o o linear
YS Model w/ Memory + o linear
Halpin et al. Model o o linear
Our Model
Epistemic Model + + power-law
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 40
39. Solar System
Neptun
Uranus
Jupiter
Saturn
Flickr, cc Sep 1 2008 by Image Editor
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 41
40. Agenda
• Risks and Opportunities in Social Communities:
the ROBUST project
• Web Science Methodology:
An explanation by analogy with Physics
and some initial (!) applications to online communities
• Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising from
individual behavior (micro level)
• Predicting dynamic system behavior,
recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level
• Controling dynamic system behavior by collective action
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 42
41. Overall Scheme
User interface Something else?
Tagging
Conceptualization Behavior
Comparison
of Statistics
Own Shared
Knowledge terminology
Model of User Interface Influence
Simulated
Joint Stochastic Model Tagging
Behavior
Model of Own Knowledge Model of Sharing
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 43
42. What is our Uranus?
What is this?
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 44
43. Uranus = Spam [Dellschaft+Staab,
WebSci 2010]
Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 45
44. Why care? The Bibsonomy Example
Complete snapshot of Bibsonomy system
Manually labeled ground truth of spammers in the data set
Users Tags Resources TAS
Spammers 29,248 297,846 1,197,354 13,258,759
Non-Spammers 2,467 61,154 234,143 816,196
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 46
45. Why care? The Delicious Example
Crawled during the TAGora Project
Users Tags Resources TAS
532,938 2,482,850 18,778,566 140,305,446
Amount of spammers not known exactly
Estimation based on random sample of 500 users:
With 95% probability: Between 1.972 and 12.949 spammers
Delicious most likely already applies spam detection
Why care about ~ 1.5% spammers in Delicious?
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 47
46. Filtering Results (Users)
Number of Spammers and Non-Spammers
16000
14000
12000
10000
Spammer
8000 Non-Spammer
6000
4000
2000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 48
47. Filtering Results (Tag Assignments)
Filtered and unfiltered number of TAS
450000
400000
350000
300000
250000
Spam
Non-Spam
200000
150000
100000
50000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 49
48. That’s why
Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 50
49. How statistically significant is the epistemic model for
normal users?
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 51
50. Lessons learned
Uranus was discovered because it affected Neptun
Pluto was discovered because it affected Uranus!
Spammers can be discovered by their behavior,
even if you do not know what kind of spam they are
producing!
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 52
51. How do constellations in the sky evolve?
http://www.flickr.com/photos/furious-angel/2142647358/sizes/o/in/photostream/
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 53
52. Agenda
• Risks and Opportunities in Social Communities:
the ROBUST project
• Web Science Methodology:
An explanation by analogy with Physics
and some initial (!) applications to online communities
• Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising from
individual behavior (micro level)
• Predicting dynamic system behavior,
recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level
• Controling dynamic system behavior by collective action
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 54
53. Example: Network
Person Friendship
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 55
54. SUGGESTING WHOM TO LINK
TO NEXT
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 56
55. Use Networks for Recommendation
:-(
me
Goal: Predict who a person will add as friend
Facebook's algorithm: find friends-of-friends
→ Problem: Rest of the network is ignored!
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 57
56. Algebraic Graph Theory
3
1 2 4 5 6
Represent a network 1 2 3 4 5 6
1 0 1 0 0 0 0
by an adjacency matrix A:
2 1 0 1 1 0 0
3 0 1 0 1 0 0
Aij = 1 when i and j are connected A=
4 0 1 1 0 1 0
Aij = 0 when i and j are not connected 5 0 0 0 1 0 1
6 0 0 0 0 1 0
A is square and symmetric.
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 58
57. Baseline: Friend of a Friend Model
Count the number of ways a person can be found as
the friend of a friend.
Consider the matrix product AA = A2
2 3
0 1 0 0 0 0 1 0 1 1 0 0
1 0 1 1 0 0 0 3 1 1 1 0
0 1 0 1 0 0 1 1 2 1 1 0
=
0 1 1 0 1 0 1 1 1 3 0 1
0 0 0 1 0 1 0 1 1 0 2 0 1 2 4
0 0 0 0 1 0 0 0 0 1 0 1
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 59
58. Eigenvalue Decomposition
Write the matrix A as a product:
A = UΛUT
where
U are the eigenvectors UTU = I
Λ are the eigenvalues Λij = 0 when i ≠ j
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 60
59. Computing A2
Use the eigenvalue decomposition A = UΛUT
A2 = UΛUT UΛUT = UΛ2UT
Exploit U and Λ:
T
U U = I because U contains eigenvectors
(Λ ) = Λ because Λ contains eigenvalues
2 2
ii ii
Result: Just square all eigenvalues!
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 61
62. Why the Matrix Exponential
An
= Number of paths of length n
aA2 + bA3 + cA4 + . . .
= Number of paths, weighted by path length
→ New edges more likely to appear when there are
many paths already
→ When a > b > c > . . . > 0, short paths are
weighted more
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 64
63. Computing Power Series
Let p(A) be a power series:
p(A) = aA2 + bA3 + cA4 + . . .
= aUΛ2UT + bUΛ3UT + cUΛ4UT + . . .
= U(aΛ2 + bΛ3 + cΛ4 + . . .)UT
= Up(Λ)UT
Therefore:
Power series change only the eigenvalues!
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 65
64. TRACKING THE EVOLUTION
OF THE NETWORK AS A
WHOLE
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 66
65. Diversity
• Many, equally-sized subcommunities
• High entropy
• ‘Flat’ structure
Regularity
• Few large subcommunities
• Low entropy
• Many ‘hubs’
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 67
66. ⇒ ⇒
Network Evolution
• How did a network look at time t?
• Idea: Observe the change of diversity/regularity over time
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 68
67. Outline
1. Power-law exponent
2. Weighted spectral distribution
3. Network entropy
4. Network rank
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 69
68. 1. Power-law Exponent
Number of neighbors is unevenly distributed:
Epinions trust network (Massa et al. 2005)
C(n) ∼ n−γ
Results in a power-law (Newman 2006)
Higher exponent γ denotes less regularity
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 70
69. 1. Power-law Exponent over Time
Epinions trust network (Massa et al. 2005)
γ shrinks ⇒ Network becomes more regular
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 71
70. 2. Weighted Spectral Distribution
• Consider the n×n matrix N defined by
Nij = 1 / sqrt(d(i)d(j)) when (i,j) is an edge
Nij = 0 otherwise
Then the distribution of the eigenvalues of N is called the
weighted spectral distribution (WSD) (Fay et al. 2010)
Eigenvalues nearer to ±1: diversity
Eigenvalues nearer to 0: regularity
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 72
71. 2. Weighted Spectral Distribution over Time
CiteULike user–tag network (Emamy et al. 2007)
• The WSD shifts to zero ⇒ Regularization
The WSD shifts towards zero ⇒ The network becomes regular
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 73
72. 3. Network Entropy
G = G1 ∪ G2 ∪ . . . ∪ Gr
• Write the graph G as a sum of subgraphs Gk
Each Gk has weighted edges, with total weight λk
• When picking an edge from G at random, the probability of
it being in community Gk is
λk / (λ1 + λ2 + . . . + λr) = λk / L
• The entropy of this distribution is (Kunegis et al. 2011)
H(G) = − Σk (λk / L) log (λk / L)
• Entropy: Effective number of subcommunities
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 74
73. 3. Network Entropy over Time
Enron email network (Klimt et al. 2004)
absolut
e
Entropy (H(G))
zoo
m
Entropy is constant ⇒ Constant number of communities
0
Time (t)
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 75
74. 4. Network Rank
Decompose network into subcommunities:
G = G1 ∪ G2 ∪ . . . ∪ Gr
The rank r is a measure of diversity:
rank(G) = r
Weighted rank:
rank∗(G) = Σk |Gk| / |G1|
Robust measure of diversity (Kunegis et al. 2011)
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 76
75. 4. Network Rank over Time
Network rank (rank∗(G))
Enron email network (Klimt et al. 2004)
Time (t)
• Increasing network rank: increasing diversity
• Shrinking network rank: shrinking diversity
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 77
76. More Network Rank Plots
Epinions trust network
hep-th citations Wikipedia elections
frwikibooks edits MIT conference contacts YouTube social network
(biased towards good examples of convex evolution)
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 78
77. Conclusion
• Power-law exponent shrinks
– Connection diversity shrinking
• Weighted spectral distribution shifts to zero
– Emerging main components
• Entropy is constant
– Effective number of communities is constant
• Network rank increases, then shrinks
– Two-phase- model of expansion
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 79
78. Watch out!
KONECT – Koblenz Network Collection
http://uni-koblenz.de/~kunegis/paper/kunegis-
konect.poster.pdf
Coming soon!
Follow #ictrobust or @kunegis or @ststaab
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 80
79. Why has the sky the density it has?
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 81 14,
Flickr, cc Oct 2007, Michael Donough
80. Why do tagging systems have so little spam?
Administrative
Process
Content Community User
Quality Policy Roles
Content
Steffen Staab
Process
Web Science Doctoral
staab@uni-koblenz.de Summer School 82
81. Agenda
• Risks and Opportunities in Social Communities:
the ROBUST project
• Web Science Methodology:
An explanation by analogy with Physics
and some initial (!) applications to online communities
• Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising from
individual behavior (micro level)
• Predicting dynamic system behavior,
recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level
• Controling dynamic system behavior by collective action
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 83
82. Yahoo Answers
• Ensure quality of user generated content
• Use of administrators and community moderators How?
• Policy influences community processes
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 84
84. Communities need Governance
Steering and coordinating actions of community members
[Benz2004]
Goal: Successful and flourishing community
High quality user-generated content
Active community members
[ http://www.flickr.com/photos/61433480@N02/5593890914/, http://www.flickr.com/photos/boojee/3733902852/ ]
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 86
85. Motivation
Different types of
Web communities
User-generated content (video, photos, comment, article,
questions, answers, posting, review text)
What are the most successful means of
governance for user-generated content?
Analyze successful platforms and compare
their means of governance!
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 87
86. Means of Governance
1. Direct intervention of community owner
Affecting content or users based on apparent properties
2. Functionality of the community platform
Text Reviews Bookmarks
Ratings Abuse Reports
Assessment
User-generated Content Modification Community
Content Complex User Roles Member
Selection & Ranking
Ratings Score
Time Views Replies Hide Low Quality
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 88
87. Method
Selection of 250 most prominent web sites with community
functionality according to Alexa Page Rank
Clustering web sites in four groups according to purpose
Social Media Editorial News
Social Networking Social Reviewing
Top-5 web sites of each group analyzed (*)
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 89
88. Key Results
(1) Abuse Reports are a successful means of governance.
• 16 occurrences
• Restricted to filter out unwanted content
• Staff needed – expensive but efficient [Schwagereit2010]
(2) Simple ratings are dominant – but battle between
“Like” and “Like/Dislike”
• “Like”: 9 occurrences
• “Like/Dislike”: 7 occurrences
• Tradeoff between simplicity and improved ranking ability
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 90
89. Key Results
(3) Creation time is most implemented ranking criterion
• 18 occurrences
• Others: score: 8, ratings: 6
• Important content is renewed - unimportant content will be
forgotten
(4) Content modification and user roles are rarely
implemented
2 occurrences
Requires complex role system and users
who understand it
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 91
90. GOVERNANCE MODEL:
DEEP DIVE - SIMULATION
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 92
91. Methodology Principle
1. Define a Web Community model
(Lycos IQ, Yahoo Answers…)
2. Adapt this model to an existing community
3. Estimate parameters
4. Define quality measure
5. Simulate community behaviour
6. Compare simulation results with real data
7. Analyze quality measures wrt variations of CoSiMo
parameters
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 93
92. Dataset Lycos IQ
Time Period: 909 days
Users: 34.327
Administrators: 36
Questions: 1.031.982
Answers: 2.996.446
Deleted non-compliant Answers: 21.139
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 94
93. Observed parameters (input to simulation)
100000
10000
1000
100 Number of
Users
10
1
0-999
1000-1999
2000-2999
3000-3999
4000-4999
5000-5999
0.9-1.0
0.8-0.89
6000-6999
0.7-0.79
0.6-0.69
7000-7999
0.5-0.59
0.4-0.49
>7000
0.3-0.39
0.2-0.29
Answers
0.1-0.19
0.0-0.09
per year
Rate of Compliant Answers
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 95
94. Example Behaviors and Example Policies
Behaviors of Ordinary Users: Reading Policies for
• Create new postings Administrators:
• Read existing postings PA: random selection of
• Report non-compliant postings
postings PB: random selection of
OR give bonus points to postings that no other
poster administrator has examined
so far
Moderator Users: PC: selection of postings that
• Create new postings were most often reported
• Read existing postings by users for being non-
• Delete non-compliant
compliant
posting
OR give bonus points to Promotion Policy:
poster PM-X : ordinary users become
moderators (who can
Administrators: delete postings) when
•Read existing postings having at least X bonus
•Delete non-compliant points
postings
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 96
95. How many administrators are needed?
1,05 0,95-
1,05
0,95 0,85-
Recent 0,95
0,85 Posting 0,75-
Quality 0,85
0,75
0,65-
0,65 0,75
5
10
20
40
1152
80
288
72 160 Additional non-compliant
320
18
640
4 Postings (per day)
1280
Number of Administrators
2560
1
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 97
96. Fighting spam with administrators…
1
0,998
0,996
0,994 0,998-1
Recent 0,992
Posting 0,99 0,996-0,998
576
Quality 0,994-0,996
72
9
0,992-0,994
1
0,99-0,992
Number of
Administrators
Applied Policies
Variation of policies and number of administrators
• Efficient policies result in high quality content
• A minimum of 18 administrators are needed
• Many moderators are needed to bring the quality to a high level
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 98
97. Fighting spam with user moderators…
1
0,95
0,9
0,85
0,8 0,95-1
0,75
0,7 Recent
5 0,65
100,6 Posting
20
40 0,9-
80
160 Quality
320
640 0,95
1280
PA+PB+PC+PM12
2560 0,85-
PA+PB+PC+PM25
PA+PB+PC+PM1…
PA+PB+PC+PM50
PA+PB+PC+PM3…
PA+PB+PC+PM100
PA+PB+PC+PM200
PA+PB+PC+PM400
PA+PB+PC+PM800
PA+PB+PC
PA+PB
0,9
PA
Additional non-
compliant
Postings (per
day)
Applied Policies
Variation of policies and posting quality
• A limited number of administrators has a limited capacity of
filtering a surge of non-compliant postings
• Moderators are helping to increase quality
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 99
98. Lessons Learned
• Strategy of selecting questionable postings is crucial
• Reporting by normal users is the most effective strategy
• Moderators are not so effective as expected, if they hunt
only incidentally for non-compliant content
• Sufficiently strong requirements regarding moderator
profiles lead to high quality of moderators
• Policies for promoting users need to be based on a
criterion that is time dependent
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 100
99. Agenda
• Risks and Opportunities in Social Communities:
the ROBUST project
• Web Science Methodology:
An explanation by analogy with Physics
and some initial (!) applications to online communities
• Modeling dynamic system at micro level,
Understanding collective effects (macro level) arising from
individual behavior (micro level)
• Predicting dynamic system behavior,
recognizing behavior deviating from the model
• Modeling dynamic system behavior at the macro level
• Controling dynamic system behavior by collective action
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 101
100. Are we satisfied here? No! Not by far!
Understand how and why users tag or tweet?
-> What are people‘s limitations that affect the system?
-> Psychology and Sociology!
What are their legal boundaries?
-> How can you shape the systems?
-> Law!
What are organizations‘ incentives?
-> Why and how do organizations participate?
-> Nice example: open source
-> Economy
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 102
101. Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Thank You!
102. References
The Slashdot Zoo: Mining a social network with negative edges
J. Kunegis, A. Lommatzsch and C. Bauckhage
In Proc. World Wide Web Conf., pp. 741–750, 2009.
Learning spectral graph transformations for link prediction
J. Kunegis and A. Lommatzsch
In Proc. Int. Conf. on Machine Learning, pp. 561–568, 2009.
Spectral analysis of signed graphs for clustering, prediction and
visualization
J. Kunegis, S. Schmidt, A. Lommatzsch and J. Lerner
In Proc. SIAM Int. Conf. on Data Mining, pp. 559–570, 2010.
Network growth and the spectral evolution model
J. Kunegis, D. Fay and C. Bauckhage
In Proc. Conf. on Information and Knowledge Management,
pp. 739–748, 2010.
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 104
103. References
B. Viswanath, A. Mislove, M. Cha, K. P. Gummadi, On the
evolution of user interaction in Facebook. In Proc.
Workshop on Online Social Networks, pp. 37–42, 2009.
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 105
104. References
K. Dellschaft, S. Staab. An Epistemic Dynamic Model for
Tagging Systems. HYPERTEXT 2008, Proceedings of the
19th ACM Conference on Hypertext and Hypermedia,
June 19-21, 2008 - Pittsburgh, Pennsylvania, USA.
K. Dellschaft, S. Staab. On Differences in the Tagging
Behavior of Spammers and Regular Users. In: Proc. of
WebSci-2010, Raleigh, April, 2010.
F. Schwagereit, S. Sizov, S. Staab. Finding Optimal Policies
for Online Communities with CoSiMo. In: Proc. of WebSci-
2010, Raleigh, US, April, 2010.
Steffen Staab Web Science Doctoral
staab@uni-koblenz.de Summer School 106