Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
What makes communities tick? Community health analysis using role compositions
1. WHAT MAKES COMMUNITIES
TICK?
COMMUNITY HEALTH ANALYSIS
USING ROLE COMPOSITIONS
MATTHEW ROWE1 AND HARITH ALANI2
1SCHOOL OF COMPUTING AND COMMUNICATIONS,
LANCASTER UNIVERSITY, LANCASTER, UK
2KNOWLEDGE MEDIA INSTITUTE, THE OPEN UNIVERSITY,
MILTON KEYNES, UK
2012 ASE/IEEE INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING
AMSTERDAM, THE NETHERLANDS
http://www.matthew-rowe.com | http://www.lancs.ac.uk/staff/rowem
m.rowe@lancaster.ac.uk
2. Managing Online Communities
1
Many businesses provide online communities to:
Increase customer loyalty
Raise brand awareness
Spread word-of-mouth
Facilitate idea generation
Online communities incur significant investment in terms of:
Money spent on hosting and bandwidth
Time and effort for maintenance
Community managers monitor community ‘health’ to:
Ensure longevity
Enable value generation
However, the notion of ‘health’ is hard to pin down
What makes Communties Tick? Community Health Analysis using Role Compositions
3. The Need for Interpretation
2
Online communities are dynamic behavioural ecosystems
Users in communities can be defined by their roles
i.e. Exhibiting similar collective behaviour
Prevalent behaviour can impact upon community members and health
Management of communities is helped by:
Understanding the relation between behaviour and health
How user behaviour changes are associated with health
Encouraging users to modify behaviour, in turn affecting health
e.g. content recommendation to specific users
Predicting health changes
Enables early decision making on community policy
Can we accurately and effectively detect positive and negative changes in
community health from its composition of behavioural roles?
What makes Communties Tick? Community Health Analysis using Role Compositions
4. Outline
3
SAP Community Network
Community Health Indicators
Measuring Role Compositions:
Measuring user behaviour
Inferring behaviour roles
Mining behaviour roles
Experiments:
Health Indicator Regression
Health Change Detection
Findings and Conclusions
What makes Communties Tick? Community Health Analysis using Role Compositions
5. SAP Community Network
4
Collection of SAP forums in which users discuss:
Software development
SAP Products
Usage of SAP tools
Points system for awarding best answers
Enables development of user reputation
Provided with a dataset covering 33 communities:
Spanning 2004 - 2011
1400
95,200 threads
1000
421,098 messages Post Count
78,690 were allocated points 600
32,942 users
0 200
2004 2005 2006 2007 2008 2009 2010 2011
What makes Communties Tick? Community Health Analysis using Role Compositions
6. Community Health Indicators
5
From the literature there is no single agreed measure of ‘community health’
Multi-faceted nature: loyalty, participation, activity, social capital
Different communities and platforms look at different indicators
Indicator 1: Churn Rate (loyalty)
The proportion of users who participate in a community for the final time
Indicator 2: User Count (participation)
The number of participating users in the community
Indicator 3: Seeds-to-Non-Seeds Posts Proportion (activity)
The Proportion of seed posts (i.e. thread starters that receive a reply) to non-seeds (i.e. no
reply)
Indicator 4: Clustering Coefficient (social capital)
The average of users’ clustering coefficients within the largest strongly connected
component
What makes Communties Tick? Community Health Analysis using Role Compositions
7. Measuring Role Compositions I:
Modelling and Measuring User Behaviour
6
According to existing literature, user behaviour can be defined using 6
dimensions:
(Hautz et al., 2010), (Nolker and Zhou, 2005), (Zhu et al., 2009), (Zhu et al.,
2011)
Focus Dispersion
Measure: Forum entropy of the user
Engagement
Measure: Out-degree proportioned by potential maximal out-degree
Popularity
Measure: In-degree proportioned by potential maximal in-degree
Contribution
Measure: Proportion of thread replies created by the user
Initiation
Measure: Proportion of threads that were initiated by the user
Content Quality
Measure: Average points per post awarded to the user
What makes Communties Tick? Community Health Analysis using Role Compositions
8. Measuring Role Compositions II:
Inferring Roles
7
1. Construct features for community users at a given time step
2. Derive bins using equal frequency binning
Popularity-low cutoff = 0.5, Initiation-high cutoff = 0.4!
3. Use skeleton rule base to construct rules using bin levels
Popularity = low, Initiation = high -> roleA!
Popularity < 0.5, Initiation > 0.4 -> roleA!
4. Apply rules to infer user roles and community composition
5. Repeat 1-4 for following time steps
What makes Communties Tick? Community Health Analysis using Role Compositions
9. e as a parameter k. To judge the best model - i.e. cluster
hod and number of clusters - we measure the cohesion and
aration of a given clustering as follows: For each clustering
rithm (Ψ) we iteratively increase the number of clusters
Measuring Role Compositions III:
to use where 2 ≥ k ≥ 30. At each increment of k we
rd the silhouette coefficient produced by Ψ, this is defined
Mining Roles (Skeleton rule base compilation)
a given element (i) in a given cluster as:
8 bi − a i
si = (3)
max(ai , bi )
1. Select the tuning segment
Where ai denotes the average distance to all other items
he same cluster and i is given by calculating thebehaviour dimensions
b 2. Discover correlated average
ance with all other items inRemoved Engagement and and Fig. 2. kept Popularityfeature distributions in each of the 11 clusters.
each other distinct cluster Contribution, Boxplots of the (Pearson r > 0.75, p < 0.01)
taking the minimum distance. The value of s i ranges Feature distributions are matched against the feature levels derived from equal-
frequency binning
ween −1 and 1 where the Clusterindicates a poor cluster- groups
3. former users into behavioural
TABLE II
where distinct items are grouped role labels for clusters
4. Derive
together and the latter M APPING OF CLUSTER DIMENSIONS TO LEVELS . T HE CLUSTERS ARE
cates perfect cluster cohesion and separation. To derive ORDERED FROM LOW PATTERNS TO HIGH PATTERNS TO AID LEGIBILITY.
silhouette coefficient (s(Ψ(k)) for the entire clustering
0.04
Cluster Dispersion Initiation Quality Popularity
1 L L L L
take the average silhouette coefficient of all items. We
0.6
0.03
0 L M H L
6 L H M M
that the best clustering model and number of clusters to
Dispersion
10 L H M H
0.4
Initiation
0.02
4 L H H M
is K-means with 11 clusters. We found that for smaller 2,5 M H L H
8,9 M H H H
0.2
ter numbers (k = [3, 8]) each clustering algorithm achieves
0.01
7 H H L H
3 H H H H
parable performance, however as we begin to increase the
0.00
0.0
ter numbers K-means improves while the two remaining
0 1 2 3 4 5 6 7 8 9
• 1 - Focussed Novice
0 1 2 3 4 5 6 7 8 9
Cluster
decision node, we measure the entropy of the dimensions and
Cluster
rithms produce worse cohesion and separation.
• 2,5 - Mixed Novice
0.020
10
• 7 Distributed with their levels across the clusters, we then choose the dimension
) Deriving Role Labels: -Provided Novice the most cohesive
0.015
8
• 3 - Distributed Expert with the largest entropy. This is defined formally as:
separated clustering• of users we then derive role labels
8,9 - Mixed Expert
6
Popularity
0.010
Quality
|levels|
each cluster. Role label 0derivation first Participant inspecting
• - Focussed Expert involves
4
• - each cluster and
dimension distribution4inFocussed Expert Initiator aligning the H(dim) = − p(level|dim) log p(level|dim) (4)
0.005
2
ibution with a level • mapping (i.e. low, mid, high). This
6 - Knowledgeable Member level
0.000
• 10 - Knowledgeable Sink
0
bles the conversion of Communties Tick? Community Health Analysis using Role Compositions
What makes continuous dimension ranges into
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
Cluster Cluster
rete values which our rule-based approach requires in the
eton Rule Base. To perform this alignment we assess the
10. Experiment 1: Health Indicator Regression
9
Managing online communities is helped by understanding the
relation between behaviour and health
Experimental Setup
Induced Linear Regression Models for each Health Indicator and
Community
Using a time-series dataset
Dependent variables: 9 roles with composition proportions as values at a given time
point
E.g. @ t = k: Mixed Expert = 0.05, Distributed Novice = 0.51, etc.
Independent variable: health indicator (e.g. churn rate) at the same time point
E.g. @ t = k: Churn Rate= 0.21
PCA of each community health indicator model using the model’s coefficients
Look for a common health composition pattern
What makes Communties Tick? Community Health Analysis using Role Compositions
11. Experiment 1: Health Indicator Regression
Results
10
Churn Rate User Count Seeds / Non−seeds Prop Clustering Coefficient
50 100
300
353 353 264 256
419 101
100
200
161 419 265
412 418
419 21056
100
50 413
354
50 412
252
270
414
420
319
198
226
0
101
100
252 197
226 44 470
PC2
PC2
PC2
PC2
319
270210 44
0
414
420
198
470
354
256
265 264 126570
2
226
412 50 197
0
101 319
414
420
21056
470 418
−50
413
56 264 1619798
1413
252
354 161 354
413
197 414 161
256 470 264
210
198
420
319
4425256
0
226
2 270 419
44
101
412
−200
265 56
−100
−200
−150
418 50 353418 353
−200 200 600 −800 −400 0 400 −400 0 200 −600 −200 200
PC1 PC1 PC1 PC1
Common Health Composition Pattern
Churn Rate: Differences for Focussed Expert Participant & Mixed Expert, similarities for
Focussed Expert Initiators (decrease in role correlated with increase in churn rate)
User Count: Differences for Focussed Expert Initiators, commonalities for knowledgeable roles
Seeds-to-Non-Seeds: Similar effects for Focussed Expert Initiators and Participants, and
Distributed Experts (all decrease in role correlated with increased proportion)
Clustering Coefficient: no common patterns
Idiosyncratic Health Composition Pattern
Divergence patterns between outlier communities
No general pattern exists that describes the relation between roles and health
What makes Communties Tick? Community Health Analysis using Role Compositions
12. Experiment 2: Health Change Detection
11
Can we accurately and effectively detect positive and negative changes in
community health from its composition of behavioural roles?
Experimental Setup
Binary classification of indicator change
At t=k+1: predict increase or decrease in health indicator from t=k
Time-ordered dataset:
Features @ t=k+1: 9 roles with composition proportions as values
Class @ t=k+1: positive (if increase from t=k), negative (if decrease)
Divide dataset into 80/20 split maintaining time-ordering
Tested using a logistic regression classifier
Platform-level model
Community-specific model
Evaluated using Matthews Correlation Coefficient (MCC) and Area under the ROC
Curve (AUC)
What makes Communties Tick? Community Health Analysis using Role Compositions
13. find that for the 412 and 414 central forums we achieve
poorer performance than the baseline for the User Count and
Clustering Coefficient.
Experiment 2: Health Change Detection TABLE IV
P ERFORMANCE OF DETECTING HEALTH CHANGES USING A LOGISTIC
Results REGRESSION MODEL INDUCED : ACROSS THE ENTIRE PLATFORM (F IGUR
IV( A )), PER - FORUM (F IGURE IV( B )) AND FOR SPECIFIC CENTRAL AND
12 OUTLIER FORUMS (F IGURE IV( C )). I N THIS LATTER CASE WE REPORT TH
M ATTHEWS C ORRELATION C OEFFICIENT AND THE F1 SCORE .
Per-forum models outperform platform (a) Platform
models for each health indicator Class
Churn
MCC Prec Recall F1
0.047 0.573 0.630 0.531 0.590
AUC
Demonstrates the need to assess and understand User Count 0.035 0.591 0.646 0.522 0.598
Seeds / Non-seeds 0.078 0.592 0.640 0.566 0.617
communities individually Clustering Coefficient 0.077. 0.591 0.641 0.581 0.647
We also yield good performance for outlier Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . 1
communities (b) Per-forum
ROC Curves surpass baseline for: Class
Churn
MCC Prec Recall
0.110** 0.618 0.634 0.619
F1 AUC
0.569
User Count 0.175** 0.652 0.661 0.650 0.589
Churn rate: 20/25 forums Seeds / Non-seeds 0.163* 0.637 0.657 0.639 0.589
Clustering Coefficient 0.089** 0.624 0.642 0.626 0.568
User Count: 20/25 forums Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 .1
Seeds-to-Non-Seeds: 19/25 forums (c) Forum Specific Results. MCC / F1
Clustering Coefficient: 17/25 forums Central Outliers
Class 252 412 414 353 419 50
Churn Rate User Count Churn Seeds / Non−seeds 0.564
0.105 / Prop Clustering Coefficient
0.042 / 0.621 0.284 / 0.700 -0.076 / 0.543 0.173 / 0.633 0.092 / 0.58
User Count 0.088 / 0.543 0.580 / 0.903 -0.106 / 0.701 0.279 / 0.648 0.299 / 0.667 0.343 / 0.69
1.0
1.0
1.0
1.0
Seeds / Non-seeds 0.117 / 0.575 0.339 / 0.717 0.189 / 0.744 0.007 / 0.519 0.265 / 0.632 0.400 / 0.81
0.8
0.8
0.8
0.8
Clustering Coefficient 0.057 / 0.536 -0.043 / 0.568 0.353 / 0.727 0.156 / 0.582 0.127 / 0.568 0.282 / 0.64
0.6
0.6
0.6
0.6
TPR
TPR
TPR
TPR
1) Results: Health Danger Detection: Thus far we have
0.4
0.4
0.4
0.4
assessed how well our detection models work in both class
0.2
0.2
0.2
0.2
settings (i.e. increase and 0.2 0.4 0.6 0.8 1.0 We now move to a
decrease).
0.0
0.0
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0
FPR FPR scenario in which we wish to FPR
FPR detect health dangers, and in
What makes Communties Tick? Community Health Analysis using Role Compositions warnings to community managers of the
doing so provide
likely reduction in health of their communities. To do this
14. Findings and Conclusions
13
No global composition pattern for the entirety of SCN
Identified key differences as to ‘What makes Communities tick’
Decrease in Focussed Experts correlated with an increase in Seeds-to-Non-Seeds
(Marin et al., 2009) found a correlation between increase in Core Users and
Network Cohesion
We found a correlation between an increase in Knowledgeable Sinks and Social Capital
Accurate detection of community health change is possible using role composition
information
Significantly outperformed baseline models
Per-forum models outperformed platform-level models
Future Work:
Explore co-dependencies between health indicators
Application of our approach over different communities and platforms
E.g. IBM Connections, Boards.ie
What makes Communties Tick? Community Health Analysis using Role Compositions
15. 14
Questions?
Web: http://www.matthew-rowe.com |http://www.lancs.ac.uk/staff/rowem
Email: m.rowe@lancaster.ac.uk
Twitter: @mattroweshow
What makes Communties Tick? Community Health Analysis using Role Compositions
Notes de l'éditeur
Assess three forums in the central cluster252 SAP Business One E-Commerce412
For common health composition pattern:Assess three forums in the central cluster and differences in coefficients252 SAP Business One E-Commerce412 Business Planning414 Strategy ManagementDifferences show that no general pattern exists