SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
MEASURING THE TOPICAL
SPECIFICITY OF ONLINE
COMMUNITIES
Extended Semantic Web Conference 2013
Montpellier, France
MATTHEW ROWE1, CLAUDIA WAGNER2, MARKUS
STROHMAIER3 AND HARITH ALANI4
1SCHOOL OF COMPUTING AND COMMUNICATIONS, LANCASTER UNIVERSITY, LANCASTER,
UK
@MROWEBOT | M.ROWE@LANCASTER.AC.UK
2INSTITUTE FOR INFORMATION AND COMMUNICATION TECHNOLOGIES, JOANNEUM
RESEARCH, GRAZ, AUSTRIA
3KNOWLEDGE MANAGEMENT INSTITUTE AND KNOW CENTRE, GRAZ UNIVERSITY OF
TECHNOLOGY, GRAZ. AUSTRIA
4KNOWLEDGE MEDIA INSTITUTE, THE OPEN UNIVERSITY, MILTON KEYNES, UK
Measuring the Topical Specificity of Online Communities
1
Why measure the topical specificity of Online Communities?
Sub-Community Creation
Measuring the Topical Specificity of Online Communities
2
¨  [Belak et al. 2011] identified community (topical) drift
¨  We found the same…
¨  Understanding topical specificity is important for:
¤  Tracking community focus
¤  Suggesting new communities
0 2 4 6 8 10
0.00.20.40.60.81.0
Entropy
F(Entropy)
High Specificity
Low Specificity
0.00 0.10 0.20 0.30
0.00.20.40.60.81.0
JS−DivergenceF(JS−Divergence)
High Specificity
Low Specificity
Theory of Attention Dynamics
Measuring the Topical Specificity of Online Communities
3
Ignorance isn’t Bliss:
An Empirical Analysis of Attention Patterns in
Online Communities
Claudia Wagner⇤, Matthew Rowe†, Markus Strohmaier‡, and Harith Alani†
⇤Institute of Information and Communication Technologies, JOANNEUM RESEARCH, Graz, Austria
Email: claudia.wagner@joanneum.at
†Knowledge Media Institute, The Open University, Milton Keynes, UK
Email: m.c.rowe@open.ac.uk, halani@open.ac.uk
‡ Knowledge Management Institute and Know-Center, Graz University of Technology,Graz, Austria
Email: markus.strohmaier@tugraz.at
Abstract—Online community managers work towards building
and managing communities around a given brand or topic. A
risk imposed on such managers is that their community may die
out and its utility diminish to users. Understanding what drives
attention to content and the dynamics of discussions in a given
community informs the community manager and/or host with the
factors that are associated with attention, allowing them to detect
a reduction in such factors. In this paper we gain insights into
the idiosyncrasies that individual community forums exhibit in
their attention patterns and how the factors that impact activity
differ. We glean such insights through a two-stage approach that
functions by (i) differentiating between seed posts - i.e. posts that
solicit a reply - and non-seed posts - i.e. posts that did not get any
replies, and (ii) predicting the level of attention that seed posts
will generate. We explore the effectiveness of a range of features
for predicting discussions and analyse their potential impact on
discussion initiation and progress.
Our findings show that the discussion behaviour of different
communities exhibit interesting differences in terms of how
attention is generated. Our results show amongst others that
the purpose of a community as well as the specificity of the topic
of a community impact which factors drive the reply behaviour
of a community. For example, communities around very specific
topics require posts to fit to the topical focus of the community in
order to attract attention while communities around more general
topics do not have this requirement. We also found that the
factors which impact the start of discussions in communities often
differ from the factors which impact the length of discussions.
Index Terms—attention, online communities, discussion, pop-
another. For example, what catches the attention of users in a
question-answering or a support-oriented community may not
have the same effect in conversation-driven or event-driven
communities. In this paper we use the number of replies that a
given post on a community message board yields as a measure
of its attention.
To explore these and related questions, our paper sets out
to study the following two research questions:
1) Which factors impact the attention level a post gets in
certain community forums?
2) How do these factors differ between individual commu-
nity forums?
Understanding what factors are associated with attention in
different communities could inform managers and hosts of
community forums with the know-how of what drives attention
and what catches the attention of users in their community.
Empowered with such information, managers could then detect
changes in such factors that could potentially impact commu-
nity activity and cause the utility of the community to alter.
We approach our research questions through an empirical
study of attention patterns in 20 randomly selected forums on
the Irish community message board Boards.ie.1
Our study was
facilitated through a two-stage approach that (i) differentiates
between seed posts - i.e. thread starters on a community
Community specificity has a bearing on attention dynamics
Community Recommendation
Measuring the Topical Specificity of Online Communities
4
¨  Users interested in a new topic often visit online communities
¨  Recommending a specific community could overwhelm the user
¤  Nuanced language, expert terms
¨  Need a comparative assessment of specificity between
communities
has sub-forum
Measuring the Topical Specificity of Online Communities
5
Can we empirically characterise how specific a given
community is based on what its users discuss?
What do we mean by ‘specificity’?
Measuring the Topical Specificity of Online Communities
6
¨  We interpret a community forum’s specificity in
relation to its parent…
Measuring Topical Specificity:
Our Approach
Measuring the Topical Specificity of Online Communities
7
Retrieve
community
posts in time
window [t,t’]
Derive
community
concept model
Select concept
using composite
function
Measure
concept
abstraction
P A c
Measuring Topical Specificity:
Our Approach
Measuring the Topical Specificity of Online Communities
8
Retrieve
community
posts in time
window [t,t’]
Derive
community
concept model
Select concept
using composite
function
Measure
concept
abstraction
P A c
Measuring Topical Specificity:
Our Approach
Measuring the Topical Specificity of Online Communities
9
Retrieve
community
posts in time
window [t,t’]
Derive
community
concept model
Select concept
using composite
function
Measure
concept
abstraction
P A c
Measuring Topical Specificity:
Our Approach
Measuring the Topical Specificity of Online Communities
10
Retrieve
community
posts in time
window [t,t’]
Derive
community
concept model
Select concept
using composite
function
Measure
concept
specificity
P A c
Extracting Concepts and Concept Models
Measuring the Topical Specificity of Online Communities
11
¨  Given: posts P published in forum f within [t,t’]
¤  Extract entities from each post’s content using Zemanta
¤  Get the concept that each entity is:
a)  an instance of: <entity rdf:type class>
b)  in the category of: <entity dcterms:subject category>
¨  To build the concept model for the forum:
¤  Record the frequency of each concept’s occurrence
that characterise the forum in the time period. We do this by processing each
post content s 2 St0
t00
f using a concept extraction tool (s) to return the set
of concepts related to the content of s. We build the concept model for the
community by recording the frequency of concept occurrences in the input posts
sets, returning At0
t00
f . This set is derived using the following construct:
At0
t00
f [ci] = |{ci : ci 2 (s), s 2 St0
t00
f }| (2)
4 Measuring Topical Specificity
Concept Extraction Model Set of post contents
Selecting Concepts using Composite Functions:
Concept Frequency
Measuring the Topical Specificity of Online Communities
12
Types Categories
Choose the most frequently cited concept in the forum
Selecting concepts using Composite Functions:
Concept Frequency-Inverse Forum Frequency
Measuring the Topical Specificity of Online Communities
13
¨  Measures how unique a concept is to the forum
¤  High CF-IFF = unique forum concept given other forums
¨  We choose the concept that maximises CF-IFF…
2. Concept Frequency-Inverse Forum Frequency: This functions selects the most
unique concept discussed in the forum with respect to all forums. This is a
modification of the existing Term Frequency-Inverse Document Frequency
measure used for term indexation. The Concept Frequency-Inverse Forum
Frequency of each concept in a given forum is measured and the concept
that returns the maximum value is chosen. The abstraction of this concept
is then measured and the reciprocal of this value taken as the specificity of
the forum. We define the Concept Frequency-Inverse Forum Frequency as
follows:
cf iff(c, f, F) =
|At0
t00
f [c]|
max
⇣
At0t00
f [c0] : c0 2 At0t00
f
⌘ ⇥ log
|F|
{f 2 F : c0 2 At0t00
f , c0 = c}
(3)
4.2 Concept Abstraction Measures
The composite functions decide on which concept to measure based on either: a)
the frequency of the concept in the forum, or b) the uniqueness of the concept
with respect to the other forums. To measure concept abstraction we define five
measures as follows, which either leverage the network structure surrounding a
concept or use the semantics of relations in the concept graph.
Network Entropy. Our first measure of concept abstraction (a(c)) is based on
The normalised frequency of the concept
appearing within the forum
How common/rare a concept
is across the forums
Measuring Specificity
Measuring the Topical Specificity of Online Communities
14
¨  Given a concept c selected using a composite
function, how can we measure its specificity?
¨  Solution: use information-theoretic measures of
abstraction…
¤  Abstraction of concept c: a(c)
¤  Specificity of concept c: 1/a(c)
¨  We examine five measures of abstraction…
Measuring Abstraction: Network Entropy
Measuring the Topical Specificity of Online Communities
15
¨  Premise: more abstract tags co-occurs with many
other tags [Benz et al. 2011]
¤  I.e. increase in variation of a random variable
¨  We adapt this for concepts…
¤  Derive co-occurrence frequencies between concepts
using number of edges (relations) between them
¤  Define conditional probability of concept co-occurrence
work by [3] in which tag abstraction is measured through the uniformity of co-
occurences. The general premise is that a more abstract tag should co-occur with
many other tags, thus producing a higher entropy - as there is more uncertainty
associated with the term. In the context of our work we can also apply the
same notion, however we must adapt the notion of co-occurrence slightly to deal
with concepts. To begin with we need to define certain preamble that will allow
network entropy, and the below network-theoretic measures, to be calculated,
using the same definition as laid out in [4]: let G = {V, E, L} denote a concept-
network, where c 2 V is the set of concept nodes, ecc0 2 E is an edge, or
link, connecting c, c0
2 V and lb(ecc0 ) 2 L denotes a label of the edge - i.e.
the predicate associating c with c0
. We can define the weight of the relation
between two concepts c and c0
by the number of times they are connected to one
another in the graph: w(c, c0
) = |{ecc0 2 E}|. From this weight measurement,
derived from concept co-occurrence, we then derive the conditional probability
of c appearing with c0
as follows, using ego(c) to denote the ego-network of the
concept c - i.e. the triples in the immediate vicinity of c:
p(c0
|c) =
w(c, c0
)
X
c002ego(c)
w(c, c00
)
(4)
Now that we defined the conditional probability of c appearing with another
concept c0
, we define the network-entropy of c as follows:
H(c) =
X
c02ego(c)
p(c0
|c) log p(c0
|c) (5)
The immediate neighbours
of concept c
Measuring Abstraction: Network Centrality
Measuring the Topical Specificity of Online Communities
16
¨  Premise: the more central a concept node is to a
network, the greater its abstraction
¤  Given the increased information flow through the node
¨  We gauge centrality using two measures:
¤  Degree Centrality
n  Number of connections from a concept node divided by vertex set
size
¤  Eigenvector Centrality
n  Determine the position of the concept node based on the
eigenstructure of the concept network
Measuring Abstraction: Statistical
Subsumption
Measuring the Topical Specificity of Online Communities
17
¨  Premise: generality of a concept can be measured
through the number of concepts that it is broader
than [Schmitz et al. 2006]
¨  Graph semantics are used to measure the number
of specialisations/narrowings of a concept…order to count how many concepts a given concept c is more general than (we
use DBPedia datasets as our concept graphs which is explained in the following
section).
SUB(c) = |{c0
: c0
2 V, ecc0 2 E, lb(ecc0 ) 2 {<skos:narrower>, <rdfs:subClassOf>}|
(8)
Key Player Problem. The final measure of abstraction that we use is taken
from Navigli & Lapatta [7] and attempts to measure the extent to which a given
node in a network is a key player in the network’s topology; that is, the extent to
which it is important for information flow through the network. To compute this
If the label of the predicate denotes a specialisation/narrowing
Measuring Abstraction: Key Player Problem
Measuring the Topical Specificity of Online Communities
18
¨  Premise: concept node position as a key player in
network topology can be used to gauge its
abstraction [Navaglia & Lapatta, 2010]
¤  Measuring its importance for information flow
¨  To derive Key Player Problem measure:
¤  Measure shortest distance from each concept to every
other
¤  Take the sum of the reciprocal of these distances
¤  Normalise the sum by vertex set size
Approach Evaluation
Measuring the Topical Specificity of Online Communities
19
Rank forums by
specificity
values
Generate
ground truth
rank
Compare
predicted and
ground truth
Retrieve
community
posts in time
window [t,t’]
Derive
community
concept model
Select concept
using composite
function
Measure
concept
specificity
P A c
ˆd d{ }
Approach Evaluation
Measuring the Topical Specificity of Online Communities
20
Rank forums by
specificity
values
Generate
ground truth
rank
Compare
predicted and
ground truth
Retrieve
community
posts in time
window [t,t’]
Derive
community
concept model
Select concept
using composite
function
Measure
concept
specificity
P A c
ˆd d{ }
Approach Evaluation
Measuring the Topical Specificity of Online Communities
21
Rank forums by
specificity
values
Generate
ground truth
rank
Compare
predicted and
ground truth
Retrieve
community
posts in time
window [t,t’]
Derive
community
concept model
Select concept
using composite
function
Measure
concept
specificity
P A c
ˆd d{ }
Approach Evaluation
Measuring the Topical Specificity of Online Communities
22
Rank forums by
specificity
values
Generate
ground truth
rank
Compare
predicted and
ground truth
Retrieve
community
posts in time
window [t,t’]
Derive
community
concept model
Select concept
using composite
function
Measure
concept
specificity
P A c
ˆd d{ }
Approach Evaluation
Measuring the Topical Specificity of Online Communities
23
Rank forums by
specificity
values
Generate
ground truth
rank
Compare
predicted and
ground truth
Retrieve
community
posts in time
window [t,t’]
Derive
community
concept model
Select concept
using composite
function
Measure
concept
specificity
P A c
ˆd d{ }
We test combinations (20 in total) of:
•  Composite function (2)
•  Abstraction Measures (5)
•  Concept Graphs (2)
Experimental Setup
Measuring the Topical Specificity of Online Communities
24
¨  Dataset:
¤  Irish community message board Boards.ie
¤  230 forums selected (filtered out low activity forums)
¨  Selecting window of analysis:
¤  Start date = 23/04/2005. Width = 1 week
within a k-week window, and found the densities to all be normally-distributed
with variance in their tails and skews. We wanted to select the most stable dis-
tribution of posts across the forums and therefore measured the kurtosis and
the skewness of each window size’s distribution - as shown in Figure 1(b). We
then chose the week that produced the minimum of these measures: 1 week. By
choosing this time period we are provided with reduced variation in the forum
post distribution and therefore a stable picture, with no large fluctuations, of
community activity.
●● ●
0 1000 2000 3000 4000 5000 6000 7000
Posts−per−day
(a) Boxplot of posts density in
2005
0 2 4 6 8 10
020406080
Number of Weeks
Kurtosis|Skewness
Kurtosis
Skewness
(b) Kurtosis and Skewness of den-
sity distributions
Fig. 1. Plots of posts-per-day distribution in 2005 (1(a)) and the distribution properties
of posts-per-forum in increasing week windows from 23/3/2005 (1(b)).
Experimental Setup: Concept Graphs
Measuring the Topical Specificity of Online Communities
25
¨  Concept models come in two flavours:
a)  Entity types (classes that an entity is an instance of)
b)  Categories (skos categories that the entity belongs in)
¨  Experiments use two concept graph types:
a)  Type Graph: DBPedia Ontology Graph
n  Nodes: classes
n  Edges: ontological relations
b)  Category Graph: DBPedia Category Structure
n  Nodes: categories
n  Edges: skos relations (broader, narrower)
Experimental Setup: Evaluation Measures
Measuring the Topical Specificity of Online Communities
26
¨  Compare predicted rank against the ground truth rank
¨  Measure rank quality using:
¤  Kendall Tau-b (1 is better)
n  Difference in the number of concordant and discordant pairs
¤  Impurity@k (0 is better)
n  Distance from each wrongly positioned forum to its true position
n  Measure for k={1,5,10,20,50,100} and average
show:broader relations. Our evaluation therefore, not only looks for the op-
timum combination of abstraction measure and composite function, but also
which concept graph to use: the Type graph or the Category graph.
Table 1. Example rankings of forums in two predicted ranks from model 1 (M1)
and model 2 (M2) together with the ground truth. The label function l(.) returns the
level of the forum from the ground truth. Our evaluation measures (Kendall ⌧b and
Impurity@k) are provided with the ordered levels as input.
GT M1 M2
Rank Index d l(d) ˆd1 l( ˆd1) ˆd2 l( ˆd2)
1 a 1 c 2 a 1
2 b 1 d 2 b 1
3 c 2 g 3 c 2
4 d 2 h 3 d 2
5 e 2 a 1 f 2
6 f 2 e 2 g 3
7 g 3 i 3 e 2
8 h 3 b 1 h 3
9 i 3 j 3 i 3
10 j 3 f 2 j 3
Evaluation Measures. To evaluate our approach we use the di↵erent combina-
tions of: a) composite functions, b) abstraction measures, and c) concept graphs,
to produce a predicted rank (ˆd) - ordering the most specific forum to the most
general - which is then compared against a ground truth rank (d). The ground
truth rank of the forums is derived from the hierarchical structure of Boards.ie
which allows a given forum to be declared as either a parent or a child of another
forum, thereby creating a nested structure. In this setting there are three levels
that a given forum can be placed in: 1 is most specific, 3 is most general and 2 is
in-between. In order to aid comprehension of our evaluation setting we present
example rankings produced by two hypothetical models (M1 and M2) in Table
1 along with the ground truth (GT). We refer to this evaluation setting as level-
based ranking as each model (M1, M2) returns a level ordering (using a label
Level-
based
Ranking
Experiments: Results
Measuring the Topical Specificity of Online Communities
27
¨  Best model: Eigenvector Centrality with Type Graph
¤  For full rank: concept frequency
¤  For top-k: CF-IFF
¨  Type graph > category graph
(Type graph with Concept Frequency and Eigenvector Centrality) we do slightly
worse than the random baseline, thereby failing to achieve the best performance
when focussing on top-k ranks.
Kendallτb
−0.2−0.10.00.10.2
Concept Frequency
Cf−iff
N
etw
ork
Ent
D
egree
C
ent
Eigenv
C
ent
StatSub
KPP
(a) Types - Kendall ⌧b
Kendallτb
−0.20.00.10.2
Concept Frequency
Cf−iff
N
etw
ork
Ent
D
egree
C
ent
Eigenv
C
ent
StatSub
KPP
(b) Categories - Kendall ⌧b
AverageImpurity
0.000.050.10
Concept Frequency
Cf−iff
N
etw
ork
Ent
D
egree
C
ent
Eigenv
C
ent
StatSub
KPP
(c) Types - Impurity@k
AverageImpurity
0.000.050.10
Concept Frequency
Cf−iff
N
etw
ork
Ent
D
egree
C
ent
Eigenv
C
ent
StatSub
KPP
(d) Categories - Impurity@k
Fig. 2. Plots of the results obtained when measuring forum specificity using: a) the
Lower is better Random model baseline
Evaluation: Qualitative Insights
Measuring the Topical Specificity of Online Communities
28
¨  ‘Discworld’ appears top for Concept Frequency
¤  Both measures return a similarly specific assessment
¨  ‘Subscribers’ appears in CF-IFF lists:
¤  Selected by the function, but different values obtained using
the measures
duced by the models. Similarities are evident when the same composite function
is used: Discworld appears at the top of both abstraction measures when us-
ing Concept Frequency - indicating that the concept selected from this forum
has the same specificity levels for both abstraction measures - while Subscribers,
despite being a mid-level forum, appears towards the top rank of each abstrac-
tion measure when using CF-IFF - indicating the existence of a concept unique
to this forum which shares a similar specificity level across the measures. Such
qualitative analysis indicates that despite the composite functions selecting the
same concept to measure the abstraction of, the measures produce, in general,
di↵erent rankings based on the concept’s network position.
Table 2. Forum rankings using the Type Graph and di↵erent combinations of com-
posite functions and abstraction measures. The integers in parentheses represent the
level of the forum on Boards.ie: 1=most specific, 3= most general.
Concept Frequency CF-IFF
Network Entropy Eigenv’ Cent’ Network Entropy Eigenv’ Cent’
Discworld (1) Discworld (1) Languages (1) Magic the Gathering (1)
The Cuckoo’s Nest (2) Angling (2) Hunting (1) Subscribers (2)
Models (2) Paganism (1) File Exchange (2) Unreal (2)
Slydice Specials (1) Feedback (2) Game Threads (1) LAN Parties (2)
Battlestar Galactica (1) Personal Issues (2) Magic the Gathering (1) World of Warcraft (1)
FS Motors (1) Mythology (2) Bangbus (1) Role Playing (2)
Gadgets (1) Films (1) Biology & Medicine (2) Midwest (2)
FS Music Equipment (1) Business Managem’ (1) Snooker & Pool (2) Game Threads (1)
Pro Evolution Soccer (2) Xbox (1) Subscribers (2) GAA (2)
Call of Duty (2) Help Desk (2) HE Video Players (1) Midlands (2)
Anime & Manga (2) DIT (2) Discworld (1) Discworld (1)
Conclusions
Measuring the Topical Specificity of Online Communities
29
¨  Presented an approach to measure the topical specificity of
online community forums:
¤  Extracted entities from forums, recorded frequencies
¤  Selected concepts to measure using composite functions
¤  Measured concept specificity using information-theoretic measures
¨  Findings showed:
a)  Type graph > category structures
b)  Eigenvector centrality was the best measure
c)  Differences in composite function…
n  Top ranks: CF-IFF
n  Full rank: Concept Frequency
Current and Future Work
Measuring the Topical Specificity of Online Communities
30
¨  Community tracking
¤  Grouped high and low specificity
communities
¤  Found significantly greater
semantic divergence in low
specificity communities
¨  Attention Dynamics Theory
¤  Plan to group communities by
topical specificity
¤  Examine how attention dynamics
differ
¤  Learn heuristics for dynamic model
adaptation
The Semantic Evolution of General and !
Specific Communities!
Matthew Rowe and Claudia Wagner!
1.  School of Computing and Communications, Lancaster University, Lancaster, UK. m.rowe@lancaster.ac.uk!
2.  Institute for Information and Communication Technologies, JOANNEUM Research, Graz, Austria. claudia.wagner@joanneum.at!
[1] Q Hong, S Kim, S. C. Cheung, and C Bird. Understanding a developer social network and its evolution. In Proceedings of the 2011 27th IEEE
International Conference on Software Maintenance, ICSM ’11, pages 323–332, Washington, DC, USA. (2011)!
[2] C. Wagner, M. Rowe, M. Strohmaier, and H. Alani. Ignorance isn’t bliss: an empirical analysis of attention patterns in online communities. In AES
Conference on Social Computing, (2012)!
[3] M Rowe, C Wagner, M Strohmaier and H Alani. Measuring the Topical Specificity of Online Communities. In the proceedings of the Extended
Semantic Web Conference. Montpellier, France. (2013)!
•  Past work [1] examined community lifecycle events (creation, growth, etc.), eschewing topical dynamics!
•  Our prior work [2] found a link between the topical focus of a community and its attention dynamics!
•  E.g. ‘Golf’ community forum required posted content to exactly match the community!
•  Changes in a community’s topics require content injection models to be re-trained!
•  Expensive and time-consuming process, and often performed offline!
•  Our solution: measure community specificity, then examine how community evolution relates to specificity!
Motivation!
•  Dataset: 230 community forums from Boards.ie!
•  Took all posts published within a 1-week window from 23/03/2005!
•  Extracted entities from posts using Zemanta!
•  Identified the type of each entity using DBPedia ontology!
<entity> rdf:type <class>!
•  Measured the specificity of each forum [3] using combinations of:!
•  Class abstraction measures (e.g. Network entropy)!
•  Composite Functions (e.g. Most specific concept)!
•  Compared performance against the Knuth Shuffle (random model)!
•  Evaluated using Kendall Tau-b and Max-Levels (proposed measure)!
•  Max-levels judges the maximal potential outlier level!
•  Best model: Eigenvector Centrality with Concept Frequency!
Measuring Topical Specificity!
MaxLevels
0.850.900.95
Most Spec
Mean Spec
Concept Freq
CF−IFF
N
etw
ork
Ent
D
egree
C
ent
Eig
env
C
ent
H
its
Authority
H
its
H
ub
StatSub
KPP
MaxLevels
0.850.900.95
N
etw
ork
Ent
D
egree
C
ent
Eig
env
C
ent
H
its
Authority
H
its
H
ub
StatSubKPP
0.850.900.95M
ostSpec
M
ean
Spec
C
onceptFreqC
F−IFF
●
2 4 6 8 10
2 4 6 8 10
0.00.10.20.30.4
Eigenvector Centrality + Concept Frequency
Density
References!
•  Divided forums up into 10 equal-frequency bins based on specificity!
•  Measured: Eigenvector Centrality with Concept Frequency!
•  High-specificity forums = members of top bin!
•  Low-specificity forums = members of bottom bin!
•  For these forums, randomly selected four 1-week periods and derived
concepts over each period, and concept vectors: !
•  Calculated cosine similarity between consecutive concept vectors!
•  Examined difference between cosine similarity distributions of high
and low forums:!
•  Low=0.783, high=0.854. p<0.05 with Student T-test (2 tailed)!
•  The result indicates that general communities exhibit greater
semantic drift than topically specific communities!
Semantic Evolution!

c1,

c2,

c3,

c4{ }
0 10 20 30 40 50
0.00.20.40.60.81.0
Cumulative Specificity
F(CumulativeSpecificity)
High Specificity
Low Specificity
0 10 20 30 40 50
0.00.20.40.60.81.0
Cumulative Specificity Difference
F(CumulativeSpecificityDiff)
High Specificity
Low Specificity
High%
Low%
@mrowebot
m.rowe@lancaster.ac.uk
www.lancs.ac.uk/staff/rowem
Questions?31
Measuring the Topical Specificity of Online Communities

Contenu connexe

Tendances

Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsInferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsNicolas Kourtellis
 
Taxonomy and survey of community
Taxonomy and survey of communityTaxonomy and survey of community
Taxonomy and survey of communityIJCSES Journal
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social NetworksKent State University
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Matthew Rowe
 
Social-aware Opportunistic Routing
Social-aware Opportunistic RoutingSocial-aware Opportunistic Routing
Social-aware Opportunistic RoutingWaldir Moreira
 
A Question of Complexity - Measuring the Maturity of Online Enquiry Communities
A Question of Complexity - Measuring the Maturity of Online Enquiry CommunitiesA Question of Complexity - Measuring the Maturity of Online Enquiry Communities
A Question of Complexity - Measuring the Maturity of Online Enquiry CommunitiesGregoire Burel
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...Duke Network Analysis Center
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networksFrancisco Restivo
 
New Approaches Social Network
New Approaches Social NetworkNew Approaches Social Network
New Approaches Social Networkepokh
 
Computer Networking meets Social Psychology
Computer Networking meets Social PsychologyComputer Networking meets Social Psychology
Computer Networking meets Social PsychologyWaldir Moreira
 
1999-UIST-Alternative interfaces for chat
1999-UIST-Alternative interfaces for chat1999-UIST-Alternative interfaces for chat
1999-UIST-Alternative interfaces for chatMarc Smith
 
Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Behrang Mehrparvar
 
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...ijasuc
 
Multidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social NetworksMultidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social NetworksDimitar Denev
 

Tendances (18)

Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsInferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
 
Taxonomy and survey of community
Taxonomy and survey of communityTaxonomy and survey of community
Taxonomy and survey of community
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
Social-aware Opportunistic Routing
Social-aware Opportunistic RoutingSocial-aware Opportunistic Routing
Social-aware Opportunistic Routing
 
A Question of Complexity - Measuring the Maturity of Online Enquiry Communities
A Question of Complexity - Measuring the Maturity of Online Enquiry CommunitiesA Question of Complexity - Measuring the Maturity of Online Enquiry Communities
A Question of Complexity - Measuring the Maturity of Online Enquiry Communities
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
11 Keynote (2017)
11 Keynote (2017)11 Keynote (2017)
11 Keynote (2017)
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 
New Approaches Social Network
New Approaches Social NetworkNew Approaches Social Network
New Approaches Social Network
 
Computer Networking meets Social Psychology
Computer Networking meets Social PsychologyComputer Networking meets Social Psychology
Computer Networking meets Social Psychology
 
1999-UIST-Alternative interfaces for chat
1999-UIST-Alternative interfaces for chat1999-UIST-Alternative interfaces for chat
1999-UIST-Alternative interfaces for chat
 
12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC
 
Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)
 
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
 
Multidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social NetworksMultidimensional Patterns of Disturbance in Digital Social Networks
Multidimensional Patterns of Disturbance in Digital Social Networks
 

Similaire à Measuring the Topical Specificity of Online Communities

Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsMatthew Rowe
 
KASW'08 - Invited Talk
KASW'08 - Invited TalkKASW'08 - Invited Talk
KASW'08 - Invited TalkRalf Klamma
 
Bootstrap Austin Community
Bootstrap  Austin  CommunityBootstrap  Austin  Community
Bootstrap Austin CommunityBijoy Goswami
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEMEMTECHSTUDENTSPROJECTS
 
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsOn Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsPC LO
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionWeGov project
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social mediaJeremiah Fadugba
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebMatthew Rowe
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...Matthew Rowe
 
Setting The Stage For Empirical Research In Virtual Social Networks
Setting The Stage For Empirical Research In Virtual Social NetworksSetting The Stage For Empirical Research In Virtual Social Networks
Setting The Stage For Empirical Research In Virtual Social Networksvia fCh
 
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEBCOMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEBIJMIT JOURNAL
 
Dynamic Data Community Discovery
Dynamic Data Community DiscoveryDynamic Data Community Discovery
Dynamic Data Community DiscoverySarang Rakhecha
 
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...IJMTST Journal
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystemsAntonio Medina
 
Feedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesFeedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesPaolo Massa
 
Activating Research Collaboratories with Collaboration Patterns
Activating Research Collaboratories with Collaboration PatternsActivating Research Collaboratories with Collaboration Patterns
Activating Research Collaboratories with Collaboration PatternsCommunitySense
 

Similaire à Measuring the Topical Specificity of Online Communities (20)

Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
 
KASW'08 - Invited Talk
KASW'08 - Invited TalkKASW'08 - Invited Talk
KASW'08 - Invited Talk
 
Bootstrap Austin Community
Bootstrap  Austin  CommunityBootstrap  Austin  Community
Bootstrap Austin Community
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in MicroblogsOn Joint Modeling of Topical Communities and Personal Interest in Microblogs
On Joint Modeling of Topical Communities and Personal Interest in Microblogs
 
Notes on mining social media updated
Notes on mining social media updatedNotes on mining social media updated
Notes on mining social media updated
 
Socialcom2011 discussionactivityprediction
Socialcom2011 discussionactivitypredictionSocialcom2011 discussionactivityprediction
Socialcom2011 discussionactivityprediction
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
 
Setting The Stage For Empirical Research In Virtual Social Networks
Setting The Stage For Empirical Research In Virtual Social NetworksSetting The Stage For Empirical Research In Virtual Social Networks
Setting The Stage For Empirical Research In Virtual Social Networks
 
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEBCOMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
 
Dynamic Data Community Discovery
Dynamic Data Community DiscoveryDynamic Data Community Discovery
Dynamic Data Community Discovery
 
Open science - Science 2.0
Open science - Science 2.0Open science - Science 2.0
Open science - Science 2.0
 
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystems
 
Feedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesFeedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online Communities
 
Activating Research Collaboratories with Collaboration Patterns
Activating Research Collaboratories with Collaboration PatternsActivating Research Collaboratories with Collaboration Patterns
Activating Research Collaboratories with Collaboration Patterns
 

Plus de Matthew Rowe

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache SparkMatthew Rowe
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesMatthew Rowe
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Matthew Rowe
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings Matthew Rowe
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesMatthew Rowe
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersMatthew Rowe
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Matthew Rowe
 
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Matthew Rowe
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureMatthew Rowe
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web SystemsMatthew Rowe
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsMatthew Rowe
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research AgendaMatthew Rowe
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social SemanticsMatthew Rowe
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesMatthew Rowe
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsMatthew Rowe
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeMatthew Rowe
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataMatthew Rowe
 
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesIntegrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesMatthew Rowe
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesMatthew Rowe
 

Plus de Matthew Rowe (20)

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian Sequences
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online Communities
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web Users
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...
 
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, Future
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web Systems
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositions
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online Communities
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on Youtube
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social Data
 
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesIntegrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
 

Dernier

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Dernier (20)

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Measuring the Topical Specificity of Online Communities

  • 1. MEASURING THE TOPICAL SPECIFICITY OF ONLINE COMMUNITIES Extended Semantic Web Conference 2013 Montpellier, France MATTHEW ROWE1, CLAUDIA WAGNER2, MARKUS STROHMAIER3 AND HARITH ALANI4 1SCHOOL OF COMPUTING AND COMMUNICATIONS, LANCASTER UNIVERSITY, LANCASTER, UK @MROWEBOT | M.ROWE@LANCASTER.AC.UK 2INSTITUTE FOR INFORMATION AND COMMUNICATION TECHNOLOGIES, JOANNEUM RESEARCH, GRAZ, AUSTRIA 3KNOWLEDGE MANAGEMENT INSTITUTE AND KNOW CENTRE, GRAZ UNIVERSITY OF TECHNOLOGY, GRAZ. AUSTRIA 4KNOWLEDGE MEDIA INSTITUTE, THE OPEN UNIVERSITY, MILTON KEYNES, UK
  • 2. Measuring the Topical Specificity of Online Communities 1 Why measure the topical specificity of Online Communities?
  • 3. Sub-Community Creation Measuring the Topical Specificity of Online Communities 2 ¨  [Belak et al. 2011] identified community (topical) drift ¨  We found the same… ¨  Understanding topical specificity is important for: ¤  Tracking community focus ¤  Suggesting new communities 0 2 4 6 8 10 0.00.20.40.60.81.0 Entropy F(Entropy) High Specificity Low Specificity 0.00 0.10 0.20 0.30 0.00.20.40.60.81.0 JS−DivergenceF(JS−Divergence) High Specificity Low Specificity
  • 4. Theory of Attention Dynamics Measuring the Topical Specificity of Online Communities 3 Ignorance isn’t Bliss: An Empirical Analysis of Attention Patterns in Online Communities Claudia Wagner⇤, Matthew Rowe†, Markus Strohmaier‡, and Harith Alani† ⇤Institute of Information and Communication Technologies, JOANNEUM RESEARCH, Graz, Austria Email: claudia.wagner@joanneum.at †Knowledge Media Institute, The Open University, Milton Keynes, UK Email: m.c.rowe@open.ac.uk, halani@open.ac.uk ‡ Knowledge Management Institute and Know-Center, Graz University of Technology,Graz, Austria Email: markus.strohmaier@tugraz.at Abstract—Online community managers work towards building and managing communities around a given brand or topic. A risk imposed on such managers is that their community may die out and its utility diminish to users. Understanding what drives attention to content and the dynamics of discussions in a given community informs the community manager and/or host with the factors that are associated with attention, allowing them to detect a reduction in such factors. In this paper we gain insights into the idiosyncrasies that individual community forums exhibit in their attention patterns and how the factors that impact activity differ. We glean such insights through a two-stage approach that functions by (i) differentiating between seed posts - i.e. posts that solicit a reply - and non-seed posts - i.e. posts that did not get any replies, and (ii) predicting the level of attention that seed posts will generate. We explore the effectiveness of a range of features for predicting discussions and analyse their potential impact on discussion initiation and progress. Our findings show that the discussion behaviour of different communities exhibit interesting differences in terms of how attention is generated. Our results show amongst others that the purpose of a community as well as the specificity of the topic of a community impact which factors drive the reply behaviour of a community. For example, communities around very specific topics require posts to fit to the topical focus of the community in order to attract attention while communities around more general topics do not have this requirement. We also found that the factors which impact the start of discussions in communities often differ from the factors which impact the length of discussions. Index Terms—attention, online communities, discussion, pop- another. For example, what catches the attention of users in a question-answering or a support-oriented community may not have the same effect in conversation-driven or event-driven communities. In this paper we use the number of replies that a given post on a community message board yields as a measure of its attention. To explore these and related questions, our paper sets out to study the following two research questions: 1) Which factors impact the attention level a post gets in certain community forums? 2) How do these factors differ between individual commu- nity forums? Understanding what factors are associated with attention in different communities could inform managers and hosts of community forums with the know-how of what drives attention and what catches the attention of users in their community. Empowered with such information, managers could then detect changes in such factors that could potentially impact commu- nity activity and cause the utility of the community to alter. We approach our research questions through an empirical study of attention patterns in 20 randomly selected forums on the Irish community message board Boards.ie.1 Our study was facilitated through a two-stage approach that (i) differentiates between seed posts - i.e. thread starters on a community Community specificity has a bearing on attention dynamics
  • 5. Community Recommendation Measuring the Topical Specificity of Online Communities 4 ¨  Users interested in a new topic often visit online communities ¨  Recommending a specific community could overwhelm the user ¤  Nuanced language, expert terms ¨  Need a comparative assessment of specificity between communities has sub-forum
  • 6. Measuring the Topical Specificity of Online Communities 5 Can we empirically characterise how specific a given community is based on what its users discuss?
  • 7. What do we mean by ‘specificity’? Measuring the Topical Specificity of Online Communities 6 ¨  We interpret a community forum’s specificity in relation to its parent…
  • 8. Measuring Topical Specificity: Our Approach Measuring the Topical Specificity of Online Communities 7 Retrieve community posts in time window [t,t’] Derive community concept model Select concept using composite function Measure concept abstraction P A c
  • 9. Measuring Topical Specificity: Our Approach Measuring the Topical Specificity of Online Communities 8 Retrieve community posts in time window [t,t’] Derive community concept model Select concept using composite function Measure concept abstraction P A c
  • 10. Measuring Topical Specificity: Our Approach Measuring the Topical Specificity of Online Communities 9 Retrieve community posts in time window [t,t’] Derive community concept model Select concept using composite function Measure concept abstraction P A c
  • 11. Measuring Topical Specificity: Our Approach Measuring the Topical Specificity of Online Communities 10 Retrieve community posts in time window [t,t’] Derive community concept model Select concept using composite function Measure concept specificity P A c
  • 12. Extracting Concepts and Concept Models Measuring the Topical Specificity of Online Communities 11 ¨  Given: posts P published in forum f within [t,t’] ¤  Extract entities from each post’s content using Zemanta ¤  Get the concept that each entity is: a)  an instance of: <entity rdf:type class> b)  in the category of: <entity dcterms:subject category> ¨  To build the concept model for the forum: ¤  Record the frequency of each concept’s occurrence that characterise the forum in the time period. We do this by processing each post content s 2 St0 t00 f using a concept extraction tool (s) to return the set of concepts related to the content of s. We build the concept model for the community by recording the frequency of concept occurrences in the input posts sets, returning At0 t00 f . This set is derived using the following construct: At0 t00 f [ci] = |{ci : ci 2 (s), s 2 St0 t00 f }| (2) 4 Measuring Topical Specificity Concept Extraction Model Set of post contents
  • 13. Selecting Concepts using Composite Functions: Concept Frequency Measuring the Topical Specificity of Online Communities 12 Types Categories Choose the most frequently cited concept in the forum
  • 14. Selecting concepts using Composite Functions: Concept Frequency-Inverse Forum Frequency Measuring the Topical Specificity of Online Communities 13 ¨  Measures how unique a concept is to the forum ¤  High CF-IFF = unique forum concept given other forums ¨  We choose the concept that maximises CF-IFF… 2. Concept Frequency-Inverse Forum Frequency: This functions selects the most unique concept discussed in the forum with respect to all forums. This is a modification of the existing Term Frequency-Inverse Document Frequency measure used for term indexation. The Concept Frequency-Inverse Forum Frequency of each concept in a given forum is measured and the concept that returns the maximum value is chosen. The abstraction of this concept is then measured and the reciprocal of this value taken as the specificity of the forum. We define the Concept Frequency-Inverse Forum Frequency as follows: cf iff(c, f, F) = |At0 t00 f [c]| max ⇣ At0t00 f [c0] : c0 2 At0t00 f ⌘ ⇥ log |F| {f 2 F : c0 2 At0t00 f , c0 = c} (3) 4.2 Concept Abstraction Measures The composite functions decide on which concept to measure based on either: a) the frequency of the concept in the forum, or b) the uniqueness of the concept with respect to the other forums. To measure concept abstraction we define five measures as follows, which either leverage the network structure surrounding a concept or use the semantics of relations in the concept graph. Network Entropy. Our first measure of concept abstraction (a(c)) is based on The normalised frequency of the concept appearing within the forum How common/rare a concept is across the forums
  • 15. Measuring Specificity Measuring the Topical Specificity of Online Communities 14 ¨  Given a concept c selected using a composite function, how can we measure its specificity? ¨  Solution: use information-theoretic measures of abstraction… ¤  Abstraction of concept c: a(c) ¤  Specificity of concept c: 1/a(c) ¨  We examine five measures of abstraction…
  • 16. Measuring Abstraction: Network Entropy Measuring the Topical Specificity of Online Communities 15 ¨  Premise: more abstract tags co-occurs with many other tags [Benz et al. 2011] ¤  I.e. increase in variation of a random variable ¨  We adapt this for concepts… ¤  Derive co-occurrence frequencies between concepts using number of edges (relations) between them ¤  Define conditional probability of concept co-occurrence work by [3] in which tag abstraction is measured through the uniformity of co- occurences. The general premise is that a more abstract tag should co-occur with many other tags, thus producing a higher entropy - as there is more uncertainty associated with the term. In the context of our work we can also apply the same notion, however we must adapt the notion of co-occurrence slightly to deal with concepts. To begin with we need to define certain preamble that will allow network entropy, and the below network-theoretic measures, to be calculated, using the same definition as laid out in [4]: let G = {V, E, L} denote a concept- network, where c 2 V is the set of concept nodes, ecc0 2 E is an edge, or link, connecting c, c0 2 V and lb(ecc0 ) 2 L denotes a label of the edge - i.e. the predicate associating c with c0 . We can define the weight of the relation between two concepts c and c0 by the number of times they are connected to one another in the graph: w(c, c0 ) = |{ecc0 2 E}|. From this weight measurement, derived from concept co-occurrence, we then derive the conditional probability of c appearing with c0 as follows, using ego(c) to denote the ego-network of the concept c - i.e. the triples in the immediate vicinity of c: p(c0 |c) = w(c, c0 ) X c002ego(c) w(c, c00 ) (4) Now that we defined the conditional probability of c appearing with another concept c0 , we define the network-entropy of c as follows: H(c) = X c02ego(c) p(c0 |c) log p(c0 |c) (5) The immediate neighbours of concept c
  • 17. Measuring Abstraction: Network Centrality Measuring the Topical Specificity of Online Communities 16 ¨  Premise: the more central a concept node is to a network, the greater its abstraction ¤  Given the increased information flow through the node ¨  We gauge centrality using two measures: ¤  Degree Centrality n  Number of connections from a concept node divided by vertex set size ¤  Eigenvector Centrality n  Determine the position of the concept node based on the eigenstructure of the concept network
  • 18. Measuring Abstraction: Statistical Subsumption Measuring the Topical Specificity of Online Communities 17 ¨  Premise: generality of a concept can be measured through the number of concepts that it is broader than [Schmitz et al. 2006] ¨  Graph semantics are used to measure the number of specialisations/narrowings of a concept…order to count how many concepts a given concept c is more general than (we use DBPedia datasets as our concept graphs which is explained in the following section). SUB(c) = |{c0 : c0 2 V, ecc0 2 E, lb(ecc0 ) 2 {<skos:narrower>, <rdfs:subClassOf>}| (8) Key Player Problem. The final measure of abstraction that we use is taken from Navigli & Lapatta [7] and attempts to measure the extent to which a given node in a network is a key player in the network’s topology; that is, the extent to which it is important for information flow through the network. To compute this If the label of the predicate denotes a specialisation/narrowing
  • 19. Measuring Abstraction: Key Player Problem Measuring the Topical Specificity of Online Communities 18 ¨  Premise: concept node position as a key player in network topology can be used to gauge its abstraction [Navaglia & Lapatta, 2010] ¤  Measuring its importance for information flow ¨  To derive Key Player Problem measure: ¤  Measure shortest distance from each concept to every other ¤  Take the sum of the reciprocal of these distances ¤  Normalise the sum by vertex set size
  • 20. Approach Evaluation Measuring the Topical Specificity of Online Communities 19 Rank forums by specificity values Generate ground truth rank Compare predicted and ground truth Retrieve community posts in time window [t,t’] Derive community concept model Select concept using composite function Measure concept specificity P A c ˆd d{ }
  • 21. Approach Evaluation Measuring the Topical Specificity of Online Communities 20 Rank forums by specificity values Generate ground truth rank Compare predicted and ground truth Retrieve community posts in time window [t,t’] Derive community concept model Select concept using composite function Measure concept specificity P A c ˆd d{ }
  • 22. Approach Evaluation Measuring the Topical Specificity of Online Communities 21 Rank forums by specificity values Generate ground truth rank Compare predicted and ground truth Retrieve community posts in time window [t,t’] Derive community concept model Select concept using composite function Measure concept specificity P A c ˆd d{ }
  • 23. Approach Evaluation Measuring the Topical Specificity of Online Communities 22 Rank forums by specificity values Generate ground truth rank Compare predicted and ground truth Retrieve community posts in time window [t,t’] Derive community concept model Select concept using composite function Measure concept specificity P A c ˆd d{ }
  • 24. Approach Evaluation Measuring the Topical Specificity of Online Communities 23 Rank forums by specificity values Generate ground truth rank Compare predicted and ground truth Retrieve community posts in time window [t,t’] Derive community concept model Select concept using composite function Measure concept specificity P A c ˆd d{ } We test combinations (20 in total) of: •  Composite function (2) •  Abstraction Measures (5) •  Concept Graphs (2)
  • 25. Experimental Setup Measuring the Topical Specificity of Online Communities 24 ¨  Dataset: ¤  Irish community message board Boards.ie ¤  230 forums selected (filtered out low activity forums) ¨  Selecting window of analysis: ¤  Start date = 23/04/2005. Width = 1 week within a k-week window, and found the densities to all be normally-distributed with variance in their tails and skews. We wanted to select the most stable dis- tribution of posts across the forums and therefore measured the kurtosis and the skewness of each window size’s distribution - as shown in Figure 1(b). We then chose the week that produced the minimum of these measures: 1 week. By choosing this time period we are provided with reduced variation in the forum post distribution and therefore a stable picture, with no large fluctuations, of community activity. ●● ● 0 1000 2000 3000 4000 5000 6000 7000 Posts−per−day (a) Boxplot of posts density in 2005 0 2 4 6 8 10 020406080 Number of Weeks Kurtosis|Skewness Kurtosis Skewness (b) Kurtosis and Skewness of den- sity distributions Fig. 1. Plots of posts-per-day distribution in 2005 (1(a)) and the distribution properties of posts-per-forum in increasing week windows from 23/3/2005 (1(b)).
  • 26. Experimental Setup: Concept Graphs Measuring the Topical Specificity of Online Communities 25 ¨  Concept models come in two flavours: a)  Entity types (classes that an entity is an instance of) b)  Categories (skos categories that the entity belongs in) ¨  Experiments use two concept graph types: a)  Type Graph: DBPedia Ontology Graph n  Nodes: classes n  Edges: ontological relations b)  Category Graph: DBPedia Category Structure n  Nodes: categories n  Edges: skos relations (broader, narrower)
  • 27. Experimental Setup: Evaluation Measures Measuring the Topical Specificity of Online Communities 26 ¨  Compare predicted rank against the ground truth rank ¨  Measure rank quality using: ¤  Kendall Tau-b (1 is better) n  Difference in the number of concordant and discordant pairs ¤  Impurity@k (0 is better) n  Distance from each wrongly positioned forum to its true position n  Measure for k={1,5,10,20,50,100} and average show:broader relations. Our evaluation therefore, not only looks for the op- timum combination of abstraction measure and composite function, but also which concept graph to use: the Type graph or the Category graph. Table 1. Example rankings of forums in two predicted ranks from model 1 (M1) and model 2 (M2) together with the ground truth. The label function l(.) returns the level of the forum from the ground truth. Our evaluation measures (Kendall ⌧b and Impurity@k) are provided with the ordered levels as input. GT M1 M2 Rank Index d l(d) ˆd1 l( ˆd1) ˆd2 l( ˆd2) 1 a 1 c 2 a 1 2 b 1 d 2 b 1 3 c 2 g 3 c 2 4 d 2 h 3 d 2 5 e 2 a 1 f 2 6 f 2 e 2 g 3 7 g 3 i 3 e 2 8 h 3 b 1 h 3 9 i 3 j 3 i 3 10 j 3 f 2 j 3 Evaluation Measures. To evaluate our approach we use the di↵erent combina- tions of: a) composite functions, b) abstraction measures, and c) concept graphs, to produce a predicted rank (ˆd) - ordering the most specific forum to the most general - which is then compared against a ground truth rank (d). The ground truth rank of the forums is derived from the hierarchical structure of Boards.ie which allows a given forum to be declared as either a parent or a child of another forum, thereby creating a nested structure. In this setting there are three levels that a given forum can be placed in: 1 is most specific, 3 is most general and 2 is in-between. In order to aid comprehension of our evaluation setting we present example rankings produced by two hypothetical models (M1 and M2) in Table 1 along with the ground truth (GT). We refer to this evaluation setting as level- based ranking as each model (M1, M2) returns a level ordering (using a label Level- based Ranking
  • 28. Experiments: Results Measuring the Topical Specificity of Online Communities 27 ¨  Best model: Eigenvector Centrality with Type Graph ¤  For full rank: concept frequency ¤  For top-k: CF-IFF ¨  Type graph > category graph (Type graph with Concept Frequency and Eigenvector Centrality) we do slightly worse than the random baseline, thereby failing to achieve the best performance when focussing on top-k ranks. Kendallτb −0.2−0.10.00.10.2 Concept Frequency Cf−iff N etw ork Ent D egree C ent Eigenv C ent StatSub KPP (a) Types - Kendall ⌧b Kendallτb −0.20.00.10.2 Concept Frequency Cf−iff N etw ork Ent D egree C ent Eigenv C ent StatSub KPP (b) Categories - Kendall ⌧b AverageImpurity 0.000.050.10 Concept Frequency Cf−iff N etw ork Ent D egree C ent Eigenv C ent StatSub KPP (c) Types - Impurity@k AverageImpurity 0.000.050.10 Concept Frequency Cf−iff N etw ork Ent D egree C ent Eigenv C ent StatSub KPP (d) Categories - Impurity@k Fig. 2. Plots of the results obtained when measuring forum specificity using: a) the Lower is better Random model baseline
  • 29. Evaluation: Qualitative Insights Measuring the Topical Specificity of Online Communities 28 ¨  ‘Discworld’ appears top for Concept Frequency ¤  Both measures return a similarly specific assessment ¨  ‘Subscribers’ appears in CF-IFF lists: ¤  Selected by the function, but different values obtained using the measures duced by the models. Similarities are evident when the same composite function is used: Discworld appears at the top of both abstraction measures when us- ing Concept Frequency - indicating that the concept selected from this forum has the same specificity levels for both abstraction measures - while Subscribers, despite being a mid-level forum, appears towards the top rank of each abstrac- tion measure when using CF-IFF - indicating the existence of a concept unique to this forum which shares a similar specificity level across the measures. Such qualitative analysis indicates that despite the composite functions selecting the same concept to measure the abstraction of, the measures produce, in general, di↵erent rankings based on the concept’s network position. Table 2. Forum rankings using the Type Graph and di↵erent combinations of com- posite functions and abstraction measures. The integers in parentheses represent the level of the forum on Boards.ie: 1=most specific, 3= most general. Concept Frequency CF-IFF Network Entropy Eigenv’ Cent’ Network Entropy Eigenv’ Cent’ Discworld (1) Discworld (1) Languages (1) Magic the Gathering (1) The Cuckoo’s Nest (2) Angling (2) Hunting (1) Subscribers (2) Models (2) Paganism (1) File Exchange (2) Unreal (2) Slydice Specials (1) Feedback (2) Game Threads (1) LAN Parties (2) Battlestar Galactica (1) Personal Issues (2) Magic the Gathering (1) World of Warcraft (1) FS Motors (1) Mythology (2) Bangbus (1) Role Playing (2) Gadgets (1) Films (1) Biology & Medicine (2) Midwest (2) FS Music Equipment (1) Business Managem’ (1) Snooker & Pool (2) Game Threads (1) Pro Evolution Soccer (2) Xbox (1) Subscribers (2) GAA (2) Call of Duty (2) Help Desk (2) HE Video Players (1) Midlands (2) Anime & Manga (2) DIT (2) Discworld (1) Discworld (1)
  • 30. Conclusions Measuring the Topical Specificity of Online Communities 29 ¨  Presented an approach to measure the topical specificity of online community forums: ¤  Extracted entities from forums, recorded frequencies ¤  Selected concepts to measure using composite functions ¤  Measured concept specificity using information-theoretic measures ¨  Findings showed: a)  Type graph > category structures b)  Eigenvector centrality was the best measure c)  Differences in composite function… n  Top ranks: CF-IFF n  Full rank: Concept Frequency
  • 31. Current and Future Work Measuring the Topical Specificity of Online Communities 30 ¨  Community tracking ¤  Grouped high and low specificity communities ¤  Found significantly greater semantic divergence in low specificity communities ¨  Attention Dynamics Theory ¤  Plan to group communities by topical specificity ¤  Examine how attention dynamics differ ¤  Learn heuristics for dynamic model adaptation The Semantic Evolution of General and ! Specific Communities! Matthew Rowe and Claudia Wagner! 1.  School of Computing and Communications, Lancaster University, Lancaster, UK. m.rowe@lancaster.ac.uk! 2.  Institute for Information and Communication Technologies, JOANNEUM Research, Graz, Austria. claudia.wagner@joanneum.at! [1] Q Hong, S Kim, S. C. Cheung, and C Bird. Understanding a developer social network and its evolution. In Proceedings of the 2011 27th IEEE International Conference on Software Maintenance, ICSM ’11, pages 323–332, Washington, DC, USA. (2011)! [2] C. Wagner, M. Rowe, M. Strohmaier, and H. Alani. Ignorance isn’t bliss: an empirical analysis of attention patterns in online communities. In AES Conference on Social Computing, (2012)! [3] M Rowe, C Wagner, M Strohmaier and H Alani. Measuring the Topical Specificity of Online Communities. In the proceedings of the Extended Semantic Web Conference. Montpellier, France. (2013)! •  Past work [1] examined community lifecycle events (creation, growth, etc.), eschewing topical dynamics! •  Our prior work [2] found a link between the topical focus of a community and its attention dynamics! •  E.g. ‘Golf’ community forum required posted content to exactly match the community! •  Changes in a community’s topics require content injection models to be re-trained! •  Expensive and time-consuming process, and often performed offline! •  Our solution: measure community specificity, then examine how community evolution relates to specificity! Motivation! •  Dataset: 230 community forums from Boards.ie! •  Took all posts published within a 1-week window from 23/03/2005! •  Extracted entities from posts using Zemanta! •  Identified the type of each entity using DBPedia ontology! <entity> rdf:type <class>! •  Measured the specificity of each forum [3] using combinations of:! •  Class abstraction measures (e.g. Network entropy)! •  Composite Functions (e.g. Most specific concept)! •  Compared performance against the Knuth Shuffle (random model)! •  Evaluated using Kendall Tau-b and Max-Levels (proposed measure)! •  Max-levels judges the maximal potential outlier level! •  Best model: Eigenvector Centrality with Concept Frequency! Measuring Topical Specificity! MaxLevels 0.850.900.95 Most Spec Mean Spec Concept Freq CF−IFF N etw ork Ent D egree C ent Eig env C ent H its Authority H its H ub StatSub KPP MaxLevels 0.850.900.95 N etw ork Ent D egree C ent Eig env C ent H its Authority H its H ub StatSubKPP 0.850.900.95M ostSpec M ean Spec C onceptFreqC F−IFF ● 2 4 6 8 10 2 4 6 8 10 0.00.10.20.30.4 Eigenvector Centrality + Concept Frequency Density References! •  Divided forums up into 10 equal-frequency bins based on specificity! •  Measured: Eigenvector Centrality with Concept Frequency! •  High-specificity forums = members of top bin! •  Low-specificity forums = members of bottom bin! •  For these forums, randomly selected four 1-week periods and derived concepts over each period, and concept vectors: ! •  Calculated cosine similarity between consecutive concept vectors! •  Examined difference between cosine similarity distributions of high and low forums:! •  Low=0.783, high=0.854. p<0.05 with Student T-test (2 tailed)! •  The result indicates that general communities exhibit greater semantic drift than topically specific communities! Semantic Evolution!  c1,  c2,  c3,  c4{ } 0 10 20 30 40 50 0.00.20.40.60.81.0 Cumulative Specificity F(CumulativeSpecificity) High Specificity Low Specificity 0 10 20 30 40 50 0.00.20.40.60.81.0 Cumulative Specificity Difference F(CumulativeSpecificityDiff) High Specificity Low Specificity High% Low%