SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
FROM MINING TO
UNDERSTANDING:
THE EVOLUTION OF
SOCIAL WEB USERS
DR. MATTHEW ROWE
SCHOOL OF COMPUTING AND COMMUNICATIONS
@MROWEBOT | M.ROWE@LANCASTER.AC.UK
Faculty of Science and Technology Christmas Conference
Lancaster University, UK
Our interests develop ‘Offline’

Primary School

High
School

University

Time
1
From Mining to Understanding: The Evolution of Social Web Users

Postgrad

Postdoc

Lecturing
And so too do our social networks…

Offline, we develop in terms of both our interests and social networks
Primary School

High
School

University

Time
2
From Mining to Understanding: The Evolution of Social Web Users

Postgrad

Postdoc

Lecturing
This also happens ‘online’, on the ‘Social Web’…

3
From Mining to Understanding: The Evolution of Social Web Users
First, Web 1.0

4
From Mining to Understanding: The Evolution of Social Web Users
Then, Web 2.0…
the ‘Social Web’

5
From Mining to Understanding: The Evolution of Social Web Users
…to understand how
people behave online

…to learn how people
shape their identities

Why study
user evolution?
…to predict churners
(from social networks and
online communities)

6
From Mining to Understanding: The Evolution of Social Web Users

…to build better
recommender systems
Talk Outline

User Lifecycles,
Properties &
Evolution
Measures

Predicting
Churners

7
From Mining to Understanding: The Evolution of Social Web Users

Recommending
Items

Conclusions
8

User Lifecycles

From Mining to Understanding: The Evolution of Social Web Users
Modelling User Evolution: Lifecycles
Offline Lifecycle Periods
Primary School

High
School

University

Postgrad

Postdoc

Lecturing

Time

First Action
Last Action
Lifecycle Periods of a potential Question-Answering System user (conjecture!)
Novice Users

Asking Questions

Asking & Answering
Questions

Answering
Questions

In reality: do not know the labels, however we can split by equal time intervals:

1

2

3

…

n

Yet, users non-uniformly distribute their activity across lifecycles

1

2

3

9
From Mining to Understanding: The Evolution of Social Web Users

…

n
User Properties in Lifecycle Stages
1

2

1
#actions

3
2

=

…

n

We divide lifetime into equal activity periods

#actions

Model the actions to user u by other users
Model the actions by user u to other users
Term

s

Count

Model the tastes of the user
10
From Mining to Understanding: The Evolution of Social Web Users

17

Web

5

Item
Mining

Model the terms used by user u

Semantic

4

Rating

Alien
Statistics

3

4*

Bladerunner

5*

Star Wars

4*
How can we track the evolution of user’s properties?
Solution: use measures from information theory

11
From Mining to Understanding: The Evolution of Social Web Users
by computing the cross-entropy of one probability distribution with respect to another distribution from an lifecycle
period, and the properties differ between time steps?
How do then selecting the distribution that minimises
cross-entropy. Assuming we have a probability distribution
Decrease = similarity between properties
(P ) formed from a given lifecycle period ([t, t0 ]), and a
probability distribution (Q) from an earlier lifecycle period,
then we define the cross-entropy between the distributions
as follows: Evolution measure 1: Cross-Entropy
X
H(P, Q) =
p(x) log q(x)
(5)
x

In properties in vein
User the same period sas

the earlier entropy analysis, we
derived the period cross-entropy for each platform’s users
User Properties in period s-1
throughout their lifecycles and then derived the mean crossentropy for the 20 lifecycle periods. Figure 2 presents the
12
cross-entropies The Evolution of Social Webthe different platforms and user
derived for Users
From Mining to Understanding:
properties. We observe that for each distribution and each
By using conditional entropy we can assess the information needed to describe the taste profile of a user at one time
How much information is transferred previous period
step (Q) using his taste profile from the from one stage (P ).
to entropy
A reduction in conditionalthe next?indicates that the user’s
taste profile is similar to information is transferred
Decrease = more that of his previous stage’s profile, while an increase indicates the converse. We define the
conditional entropy of two discrete probability distributions,
representing taste profiles, as: Conditional Entropy
Evolution measure 2:
X
p(x)
H(Q|P ) =
p(x, y) log
(5)
p(x, y)
x2P,
y2Q

We derived the conditional entropy over the 5 lifecycle
User properties in period s
periods in a pairwise fashion, i.e. H(P2 |P1 ), . . . , H(P5 |P4 ),
and User Properties in periodof the mean conditional entropy in
plotted the curve s-1
Figure 5 over each dataset’s users in the training split, also
including the 95% confidence intervals to show the varia13
From
tionMining tothe conditionalSocial Web Users
in Understanding: The Evolution of entropies. Figure 5 indicates that
examine the information transfer from a prior lifecycle stage
(s 1) to the current lifecycle stage (s) of the user. Now, assume that we have a random variable thatthe user’s the local
How do global dynamics influence describe
categories that have been reviewed at the current stage (Ys ),
properties?
a random variable of local categoriesglobal influence stage
Decrease = more susceptible to at the previous
(Ys 1 ). and a third random variable of global categories at
Increase = less susceptible to global influence
the previous stage (Xs 1 ), we then define the transfer entropy of one lifecycle stage to another as follows, based on
the work of Schreiber measure 3: Transfer Entropy
Evolution [8]:
TX!Y = H(Ys |Ys

1)

H(Ys |Ys

1 , Xs 1 )

(6)

Using the above probability distributions we can calculate
the transfer entropy based on the joint and conditional probSurprise in user properties from s-1 to s
ability distributions given the values of the random variables
Surprise in user properties in s when we
consider all users’ properties from s-1
14
From Mining to Understanding: The Evolution of Social Web Users
15

Predicting Churners via Evolution Signals
...from Online Communities

From Mining to Understanding: The Evolution of Social Web Users
d testing, using the former in this section to examine user development
e latter split forOnline Communities experiments.
Datasets: our later detection
Platform
Time Span
Post Count User Count
Facebook
[18-08-2007,24-01-2013] 118,432
4,745
SAP
[15-12-2003,20-07-2011] 427,221
32,926
Server Fault [01-08-2008,31-03-2011] 234,790
33,285

Churner ‘Cutoff’’
Defining Lifecycle Periods

For th

1500

800 1000

1

Table 1. Statistics of the online community platform datasets.

1000
500

Posts Frequency

1000

2008

2010

Time

2012

0

0

200

600

Posts Frequency

600
400
200
0

Posts Frequency

order to examine how users develop over time we needed some Fault
mean
gment a user’s lifetime (i.e. from the first date at which they post to thet
rate
simila
their final post) into discrete intervals. Prior work [6, 2, 5] has demonstr
the cr
e extent to which users develop at their own pace and thus evolve accor
must s
their own ‘personal clock ’ [5]. Hence, for deriving the lifecycle periods ofis
u
fect
thin the platforms we adopted an activity-slicing approach that divid
non-ch
(a) Facebook
(b) SAP
(c) Server Fault
comm
er’s lifetime into 20 discrete time intervals, emulating the approach in [2],
16
th an equal proportion of activity within each period. This approach than c
funct
From Mining to Understanding: The Evolution of Social Web Users
distrib
follows: we derive the Posts per-day for the ({[ti , tj ]} with ) by first deri
Figure 2: set of interval tuples datasets 2 T the
to foll
2004

2006

2008

Time

2010

2009

2010

Time

2011
0.8

0

0.2

0.4

●
●●● ●
●

0.6

0.8

0.04
0.03
0.02
0.01

●

0.10

0.15

- (b) In-degree
SAP

0.05

●
17
●
●
From Mining to Understanding: The Evolution of Social Web Users

0.10

●●●

1

●

0

●●

●

●●
●●●●●●●●

0.2

Lifecycle Stages

−period Cross Entropy

0.20
0.15

●
●

0.00

0.20
0.15
0.10
0.05

1

Lifecycle Stages

(a) In-degree
Facebook

●

0.4

●
●●●

0.6

0.8

●●●

1

Lifecycle Stages

- (c) In-degree
Server Fault
0.06

0.6

●●

●

0.04

0.4

●●

●

●

●

0.02

0.2

●●●●●●●
●

●

●●

Time−period Cross Entropy

●●
●●

●

−period Cross Entropy

●

●

0.00

●

Time−period Cross Entropy

0.00 0.02 0.04 0.06 0.08 0.10

Churners
Non−churners

●

0

.05

h
sn
To
s’
ss
bm
at

Time−period Cross Entropy

n

−period Cross Entropy

e
=

than churners. For the cross-entropy of users’ lexical term
distributions dissimilarity with prior in-degree non-churners
Cross-Entropy: we find the signals of churner andinformation
to follow a similar curvature user differ from before?
I.e. how do users who contact a given(converging on a limit with a
decaying rate) but with di↵erent magnitudes.

●

●

-
●●●

●

●●

●●
●

●

●

●
●

0.5

3.0

●●●

●
●

●

Co

Co
0.2

Co

●
●
●

Cross-Entropy: dissimilarity with community out-degree
information
(a) In-degree
- (b) In-degree
- (c) In-degree
0

0.2

0.4

0.6

0.8

1

0

0.2

Lifecycle Stages

0.4

0.6

0.8

1

0

0.2

Lifecycle Stages

0.4

0.6

0.8

1

Lifecycle Stages

-

0.4

0.6

0.8

1

●●●

0

0.2

Lifecycle Stages

0.4

0.6

●
●

●●

●

●

0.8

0

●

●

●
●

●

0.2

0.4

0.6

0.8

1

-

8.5

7.0

●●●

●
●
●●
●●● ●●●●●
●
●
●

●

umunity Cross Entropy

8.0
●

●●

●

6.5

●

●

●

- (f) Out-degree
Server Fault

7.5

umunity Cross Entropy

7.0
6.8
6.6
6.4

●

●

●●
●

●

●●●●
●

●

Lifecycle Stages

- (e) Out-degree
SAP

18
From Mining to Understanding: The Evolution of Social Web Users
●
.2

1

●

●

Lifecycle Stages

(d) Out-degree
Facebook
umunity Cross Entropy

●

●●

3.0 3.5 4.0 4.5 5.0 5.5 6.0

●

● ●●●
●
●

●

●

●

8.0

0.2

●

●

●

●

●
●

7.5

0

●

●

●●
●●

●

●

●

●

Community Cross Entropy

●

●

●

●

●

3.0 3.5 4.0 4.5 5.0 5.5 6.0

2.5

●

Community Cross Entropy

3.5
3.0

●

2.0

Community Cross Entropy

4.0

I.e.Facebook users that a user contacted differ from the Fault
how do the
SAP
Server community?

●
●

pe
to
at
is
fea

●●
●

●

●
●

●
●

●

●

●●
●

of
fo
pr
18
ra
sc
us
fea
(ii
●●

●

0.8

2.0
1.5

m(u, s + 1) m(u, s)
dm
=
the standard lineards
model: f (x; w) m(u,x. We include the
= w| s)
m (u, s) =

2. Build the prediction model
L2 -regulariser within the model to control for overfitting on
Where training splits and test di↵erent measure models. In
the m is indexed by the given -indexed (i.e. in-deg
•  Define the objective function using vectorabove goal is to minimise
period learning the model’s weight the w, our magnitude funct
cross-entropy),
the minimising the a given measure (m) vector:
to return the magnitude ofwith respect
•  Learn the model by cost function (C(w))objective: to the weight!for use
●●

●

●

●

●●

●

●

●●●

●

●

●

●

1

0

s

0.2

0.4

0.6

0.8

1

Lifecycle Stages

●●

●

Where the latter term (kwk ) defines the L2this
3. Apply the model
Goal: learn theby reducinge↵ect on the
w regularizer’s -regularizer
and
x =[m1 (u,defines .the weight of m2 (u, 2), . . . , m2 (u, 19), . .
2), . . , m1 (u, 19),
•  Over ‘held-out’ data and thus controls for overfitting on the training split:
model,
m1 (u, 2), . . . , m1 (u, 18), m2 (u, 1), . . . , m2 (u, 18)]
|w|
⇣X
⌘
•  Evaluate performance: how accurate is our 2predictor?
2
2

●

●

●

●

●●
●

●

●●●●

●

●

●

●

●

●

19
From Mining to Understanding: The Evolution of Social Web Users

●

0.8

3.0 3.5 4.0 4.5 5.0 5.5 6.0

●

Community Cross Entropy

at the allotted lifecycle period. Thus a feature vector
- (c) In-degree
X
1
2
(f (xi ; thesei 2 + and
(6)
Server Fault of the model formedC(w) a single user using w) y )rate kwk2 magnitu
is
for = 2|Dtrain |
Error
i=1
features:

●

s

●
●

0.5

●

•  Change in the magnitude into from period s to s+1
●

1.0

●
●

Comumunity Cross Entropy

2.5

feature definition and model specification, we alter the l
lexical term distributions
cycle period notation from the existing interval tuple set (
h signalPredicting Churners [t, t0 ] 2 T ) to use a set of discrete single elements: s 2
we see a growing
ds for both churners and
where S = {1, 2, . . . , 20}. Magnitude features are defin
es of the curves are the
as a given user’s measure taken at a given lifecycle peri
1. Extract Featuresm(u, s),Users’ Evolution cross-entropy curves at lifecy
from where the measure for user u is taken
•  Magnitude period s. Rates@ period s changes in measures from o
of the signal are defined as
lifecycle period to the next:

1

●

0

0.2

0.4

0.6

0.8

Lifecycle Stages

1

kwk2 =

j=0

|wj |

1
2

(7)

As a result of using both rates and magnitudes from ea
For learning the parameter weight vector (w) we use graof the 20 lifecycle periods, aside from the first and last o
Evaluation: Results

Higher = better

Area Under the receiver operator characteristic Curve (AUC) scores for the di↵erent regu
Min = 0, Max = 1!
on models and the J48 baseline art baseline
=State of the model from the state of the art (denoted by J48 ). Best mo
is in bold and significance of improvement over the random model baseline is indicated.
Platform
Facebook
J48 = 0.586

SAP
J48

= 0.759

Server Fault
J48 = 0.796

Feature Set
=0
=1
=2
=5
In-degree
0.535.
0.543.
0.538.
0.556*
Out-degree
0.674***
0.666***
0.676***
0.696***
Lexical
0.633***
0.630***
0639***
0.637***
Cross-period
0.649***
0.642***
0.649***
0.652***
Cross-community
0.684***
0.693***
0.691***
0.699***
All
0.811***
0.804***
0.816***
0.817***
In-degree
0.652***
0.651***
0.651***
0.652***
Out-degree
0.741***
0.742***
0.742***
0.742***
Lexical
0.501
0.501
0.501
0.499
Cross-period
0.614***
0.614***
0.614***
0.613***
Cross-community
0.765***
0.765***
0.765***
0.765***
All
0.816***
0.817***
0.817***
0.817***
In-degree
0.659***
0.658***
0.662***
0.663***
Out-degree
0.618***
0.617***
0.616***
0.619***
Lexical
0.680***
0.682***
0.687***
0.686***
Cross-period
0.671***
0.675***
0.680***
0.691***
Cross-community
0.778***
0.779***
0.780***
0.778***
All
0.858***
0.860***
0.861***
0.861***
Significance codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . 1

= 10
0.549**
0.690***
0.641***
0.651***
0.701***
0.819***
0.654***
0.743***
0.497
0.612***
0.765***
0.818***
0.663***
0.626***
0.684***
0.689***
0.779***
0.860***

ods to late lifecycle periods. Across all three platOur churn prediction approach makes use of the
20
find that performance improves as additional inment signals that users exhibit along both social an
From Mining to Understanding: The Evolution of Social Web Users
is added into the models. There are di↵erences,
dimensions in order to di↵erentiate between who w
in the gradient in performance between the platand who will remain within the online community p
By mining users’ evolution signals we can accurately predict
who will churn, and who will not…
…this enables the early application of retention strategies

21
From Mining to Understanding: The Evolution of Social Web Users
22

Recommending Items from Taste Evolution

From Mining to Understanding: The Evolution of Social Web Users
Recommender Systems aim to either:
(i)  Predict item adoptions
(ii)  Predict item ratings
duced from the training segment. There include the general
number ofthe given dataset (µ), which is shown in have beenas
he
bias of items within a particular category that Figure 7
reviewed, we instead include the ratings withincalculating
er
Recommendation Datasets: Item-Ratings when the training
the mean rating score across all ratings
Table 1: Statistics review define used sets, the former (D u,s,c )
the distribution. use of the mean onfor our analysis and experiments - i.e.
of
segment. The We first datasets two its own is insu cient
train Scale
Dataset
#Users #Items #Ratings
Time Span
Ratings
corresponding to the in ratings scores for[26-04-2000,31-12-2000] items
by during interval s for Movies
MovieLensthe variance ratings3,678 u 902,585
6,024
[1,5]
ng
note
the Amazon
Movie
u,s
from Tweetings& therefore 889,173 also include the corresponding to
category Reviews 19,043 latter 7,880,387 [20-08-1997,25-10-2012] i ) [1,10]
c, and we 253,059 (Dtrain[28-02-2013,23-09-2013]
the 11,451 117,206 ) item bias (b and
s’
dataset
Amazon Movies- TV
[0,5]
u,s,c
u,s
Total
ratings by u during s,The former 8,900,178Dthe average sets are
hence Dtrain ✓
is
the user bias (bu ). 914,240 268,188 bias is train , these deviation
User
…with score r…
ar
formed as follows:u… for the item i within the training segfrom the mean bias
n-coverage. ment, while the latter bias is thethe statistics ofdeviation frominthe 2 demonaverage these datasets shown Table
u,s,c
Dtrain =
, t to s, c 2 (i)}
strates the extent 2 which the items, users and ratings have
t-4.2 Amazon Movies{(u, i, r, t) : (u, i, r, t) 2 Dtrainratingspresents the distribution of reviews
mean bias from the training segment’s Figure 2 by user u.
been reduced.
(1)
For the Amazon Movie and TV Reviews dataset we
per users within each of the reduced datasets; we note that
…atstrategy t MovieLens (concentrating on users
time of
erprovided with Amazon Standard Identification Numberswere
…rated item i…
(ASINs) the collection
u,s looked up the ASINs for each item
who µ=7.7 reviewed more than 20
skews
rsas identifiers of items.train µ=3.7 {(u, i, r, t) : (u, i, r, t) have Dtrain , t 2 s} items)(2) the distribuD We the = Product Advertising
2 users who have produced many reviews, while
µ=4.1
in the dataset by querying
Amazon
tion towards
reAPI and returning the item information including: title, acfor Movie Tweetings and the Amazon datatsets we see heavy
We then MovieLens
directors. Unlike define the function
Tweetings,
tailed distributions. derive the
Table also indicates
irtors, andnot provided within the year and Movieinformation ave rating toboth users2 and itemsav- that there is a
we were
release
overall,
erage rating valueof from all ofratinglarge reduction innumber of ratings given great, this sugquadruples in the is not as however the
from the API, therefore to perform the disambiguation sereduction in the
manticset: we used the actor information from each movie:
URIs
gests two things: (i) mapped items are popular, and thus
our intuition being that each movie would have a unique set
dominate the ratings; and (iii) obscure items are present
X
of actors starring in it. Therefore we stored the actors associwithin the data. In particular for the Amazon dataset , deu,s information 1 spite our alignment covering only 10.6% of items we only
ated within each item as additional background ) =
ave2 rating(D5train 2 3 5 6have8 reduction of 126.9% of r3 total(3)5 suggesting
1
3
4
2
4
and performed disambiguation in a similar vein as1 above: 4 u,s 7 | a 9
the
ratings,
|Dtrain we cover the ‘headLifetime (in ratings user
Average Rating
Average Rating
u,s the days) per distribution in terms
we first identified candidate URIs for a given movie item
that
’ of
100

●

●

●

●
●
●
●●
●
●
● ●● ● ● ●●
●
●● ●●
● ●
● ●● ● ● ● ● ●
● ● ●
● ●● ●
●
●●●●● ●
●
●
● ●● ●●● ● ● ● ●
● ● ●●●●●●● ●●●● ● ●
●●
●● ● ●
●●● ● ● ● ●● ● ●
● ●
● ●● ● ● ● ●
●
● ● ● ●●●
● ● ●●
●
●● ● ● ●
● ● ● ●●
● ●● ●
●
●●●● ●●●● ●
● ● ● ●●●●●●●● ● ● ●
● ● ●●● ●●●● ●
●●● ● ●
● ● ●
●
●
● ●●●●●●●●● ● ●
●●● ●●
●
● ● ●●●●●●●●●●● ●● ●
●● ●●●● ● ● ● ●
● ●● ● ● ● ●
●●● ● ●
● ●
● ● ●●●●● ●●●●●●● ●●●●
● ●
● ● ● ●●●●● ● ●●●
● ● ● ●●
● ●●
●
●
● ● ● ●●● ●●●●●● ●●● ●●●● ●
●
●●●●●●●●●●●● ●●●● ●
●●●●●●●● ●●● ● ●●
● ●●●● ● ●
● ●●
● ●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●
●
●
● ● ●●●●●●●●●●●●●●●●●●●●●●
● ●●●●●●●●●●●●●●●●●●●●●●
● ●●●●● ●●●●●●●●●● ● ●●
●● ●●●●●●●●●●●● ● ● ●
●●●●● ●●● ● ●

10−4

●

●

●

p(x)

10

●

●

●

−6

10−2 10−1 100
10

●

−8

●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●●●●●●●●●●●●●●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●●●●●●●● ●
● ●●●●●●●●●●●●●●●●●●●●●●●
● ●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●● ●

●

●

●
● ●
●●
●
● ● ●●
● ●
●●
●
●
●
●
●
●
● ● ●● ● ● ● ●●
●
●
●
● ●●
●
● ●●
● ● ● ● ● ● ● ●● ● ● ●
●
●
● ● ● ●●
●
●
●
● ● ●● ●● ● ● ● ● ● ●● ●●●●●●●●
● ● ● ● ● ● ●●●●● ● ●
● ● ● ● ● ●● ●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ●
●
●● ●● ●●
● ●
●●
●
●● ● ●
● ● ● ●●●● ●●●● ● ●●
● ● ● ● ● ●●● ●●●● ●●●●●
● ● ● ● ● ● ● ●● ●● ●● ●●●●● ●●●
●● ●
● ● ●●
●●
● ●● ● ● ●● ● ●●●●● ● ●●●●●●●●
●● ●●
● ●●
● ● ●
●
● ● ●● ● ●● ●●●● ● ●● ●●●●
●
● ●
●
●● ● ● ●●●● ●●
● ● ● ● ●● ●●●● ●●●● ●●●
● ●
●● ●●
● ●
●● ● ● ●● ● ● ● ●●●●● ● ●●●●●●●●●●●●●●●
● ● ●●● ●● ●● ● ● ● ●●● ●● ● ●●●●●● ●
● ● ● ● ●●●● ●
●
●
● ●● ●
● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●●
●
●
● ●● ● ● ● ●●●● ●●●●
● ● ●● ●●●●●●●● ●●●●●●●●●●●
● ● ● ● ● ●●●
●● ● ●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●
●● ● ●● ● ●● ●●●●● ●●●●●●●●●●●●●●●●●●
●
●●● ● ●● ●● ●●● ●●●●●●●●● ●●●●●●●●●
● ●●●
● ● ●●●● ● ●●● ●●●●●●●●● ●●
●●●● ●●●●●● ● ●
●●●● ●● ●●●●●●●● ● ●●●●●●●●●●●●●●●●
●●●●●● ●●●●●●●●●●●
●● ●● ●●●●● ●●●
● ● ●●●●● ●●●
● ●
●
● ● ●● ●●●●●● ●●●●●● ●●●●●●●●●● ●
●● ● ● ● ●● ●●●●●●●●●●
● ●● ● ●● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●●
● ● ● ● ● ●●●●●● ● ●●●●●●●●●●●●●●●●●
● ● ●● ●● ●●● ●●●●●●●● ●●●●●
●● ●● ●●●● ● ●●●●●●●●●●
● ●● ●●● ●● ●●●●●●●●●●
●● ●● ● ● ●
●● ● ● ●
●
●
● ● ●●●● ●
●
● ● ●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●
● ●● ●●● ●● ●●●●● ●●●●●●●●●●●●●●●●●
●
● ●● ●● ●●●●●● ●●●●●●●●●●●●●●
● ●● ● ● ●●●●●●●●●●●● ●●●
●
●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ● ● ●●● ●●● ●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●● ●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●
●●●●● ●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●
● ● ● ● ●● ●● ● ●●●●●●●●●●●●●●●●●●●
● ● ●● ●●● ● ●●● ●●● ●●●●●●●●●●
● ●
●
● ● ● ● ●●●●●●● ● ●●
●● ● ● ●●● ● ● ●
●● ● ●● ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ●●●●●●●●●●●●●●●●●●●●●●●●●
●● ● ●●● ●● ●●●●●●●●●●●●●●
●
●● ●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●
●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●

10

10

−4

● ●
●

●●●●●●●●●●●●●●●●● ●
● ● ●●●●●●●●●●●●●●
● ● ●●●●●●●●●●●●●●
● ● ●●●●●●●●●●●● ●
● ● ●●●●●●●●● ●●
●●●● ●

10

●

●

10

●

p(x)

●
●● ●
● ● ● ●
●
● ●
● ●● ●
●
● ●●● ● ●
● ●●
● ●●
● ● ●● ●
● ●
●
● ●● ●
● ●●
●● ●● ●●● ●
● ● ●●
● ●
●
●
● ●●●●● ●● ●
●●●●● ● ●
●●●● ● ●
● ●● ● ●
● ● ●●●● ● ●●●●
●●●●● ●●●●
●
● ●●
●●
●● ●●●●●●●●● ●●●
● ●●●●●● ●●● ●
● ●●●● ● ●●● ●
● ●●● ● ●●
●
● ● ●●●●●●●●●●● ● ●● ●
● ● ●●●●●●●● ● ●●
●●●●●●●●●● ●
● ●● ●●●●
● ●●●●●
●●

10−3

p(x)

●

−3

2

−4

● ●
●

−5

10−2

●

(u,i,r,t)2Dtrain

by performing fuzzy matches between the item title and seof popularity.
25 titles. We then derived the correct URI by
mantic URIs’
From Mining to actors associated with the Social Web
comparing the set of Understanding: The Evolution of item (Aa ) Users
and the set of actors associated within each candidate URI

100

●

●

●

●●
●●

µ=5.8

−4

µ=139.7

10−1 100

●
●●
● ●
●
●●
●●●
●●
● ●●●
●●
●●●
●●
●●
●
●●
●
●
●●
● ●●

2

10−2

(a) Lens
(b) Tweetings
(c) Amazon
From these definitions we then derive the discrete probability distribution of the user’s ratings per category as fol-

µ=12.5

● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
3
2
1

Average Rating

7.0
6.0

Independent Films
Directorial Debut Films

0

Directorial Debut Films
1990s Comedy Films

5.0

Average Rating

4

5

8.0

3.8
3.6
3.4
3.2
3.0

Average Rating

4.0

the biases of the recommendation models and consider the movie ‘Alien’u, v denote u
then
information returned. For instance, for the
• restability of a given bias in in 1970, which we shall now use as a running example, denote it
item
leased light of when the rating is being
• i, j
W
made: i.e. considering the
the following categories are rating
Forming Taste Profiles fluctuation of the found: signal
• r denotes a k
the
and how this relates <h tpreviouspfluctuations. o u r c e / A l i e n ( f i l m )>
to t p : / / d b e d i a . o r g / r e s
denotes a pre
d c t e r m s : s u b j e c t c a t e g o r y : A l i e n ( f r a n c h i s e ) •i lDatasets base
f ms ;
are
port
dcterms : s u b j e c t c a t e g o r y :1979 h o r r o r f i l m s ;
dcterms : s u b j e c t c a t e g o r y : S p a c e a d v e n t u r e f i l mD ;
s and are seg
are a
d c t e r m s : s u b j e c t c a t e g o r y : F i l m s s e t i n t h e f u t u r etest ) datas
.
(D
the t
such that Dt
71%
Subject categories form a hierarchical structure such that0
• c, c
ing c
parent categories define more general subjects. For instance denote t
from
graph and C
the category category:Films_set_in_the_future is linked
twee
itself
to category:Science_fiction_films_by_genre by the pred- is deno
May
Jul
Sep
Nov
Mar Apr May Jun Jul Aug
1998
2002
2006
2010
cove
notes
Time
Time
Time
icate skos:broader, thus providing a general taxonomic clas- the set
ings
sification of the film. The advantage of such a structure is concept
nect
(a) Lens
(b) Tweetings
(c) Amazon
and
that we can explicitly identify a given user’s tastes at a given
rected graph
point in time via the categories of films that they have con- e 0 from
i.e.
deno
Item sumed, and thus rated. In order to provide such information, c,c betw
Rating
Item
Rating
s
the triple c s
Figure 3: Average ratings require a link between a given item within and thusMov
Alien however, we derived using a 7-day
4*
Space_adventure
(4+4)/2 = one
4
mo
It
of our top-2 datasets frequently rated
three most and the semantic URI that denotes
moving average of the
Bladerunner
5*
Science Fiction
(4+5+4)/3 = 4.3
denotes a
the
that movie item. However in deriving semantic web•URIs
categories.
from
mantic categ
Star Wars films we may encounter ambiguity issues where multiple
4*
for
the
films share the same title - this often happens with u,s,c tion of semaI
film reProbability of user rating category we use available information from traintion either l
the
ave
)
u,s
makes. Therefore c P r(c|Dtrain ) = X rating(D each of0
(4) m
u,s,c
Mov
high our datasets to disambiguate the semantic ave rating(Dgories: p : I
in lifecycle period s:
URIs and thus )
train
5. ANALYSING TASTE EVOLUTION this u,s
Twe
26
return the correct alignment. In c0 2Ctrain
section we describe
connected gr
view
FromAnalysing the evolution and development of users’ tastes
Mining to Understanding: The this disambiguation procedure across the three datasets usEvolution of Social Web Users
Based on this formalisation we can3 assess the relative
allows one to understand how a given rating is likely categoryyeara of theuser and lifecycle
I.e. Using not
ing two methods: one based on title and for given movie the sta
mean user score per to rate
American Films
Black and White Films
0.220

0.290

●

1

2

3

4

Lifecycle Stages

(a) Lens

5

●

●

0.205

0.275

0.225

●

0.215

0.285

●
●

●

0.210

●

●

Conditional Entropy

●

●

0.280

Conditional Entropy

0.245

●

0.235

Conditional Entropy

rate items in the future given their category information.
Conversely, for MovieLens and Movie Tweetings we see an
Conditional-Entropy: relative profiles become less
opposite e↵ect: users’ taste information differencepredictable
I.e. how dissimilar is the user’s ratings in period s from period s-1?
as they develop; users rate items in a way that renders uncertainty in profiling from previous information.

1

2

3

4

Lifecycle Stages

(b) Tweetings

5

1

2

3

4

5

Lifecycle Stages

(c) Amazon

Figure 5: Parent category conditional entropy be27
tweento Understanding: The Evolution oflifecycle stages (e.g. H(P2 |P3 ))
consecutive Social Web Users
From Mining
across the datasets, together with the bounds of the
2

3

4

Lifecycle Stages

(a) Lens

5

0.136
2

3

●

4

Lifecycle Stages

(b) Tweetings

5

0.134

●

0.132

0.114
1

●

●

0.130

1

●

●

Transfer Entropy

0.116
●

●

0.112

●

Transfer Entropy

0.122

●

0.120

Transfer Entropy

0.124

ings and Amazon we find a di↵erent e↵ect: users’ transfer
entropy actually increases over time, indicating that users
Transfer-Entropy: influence of globalpreferences, and therefore
are less influenced by global taste behaviour on the user
I.e. how does collective user behaviour influence the user’s tastes? their
the ratings of other users, and instead concentrate on
own tastes.

1

●

2

3

●

4

5

Lifecycle Stages

(c) Amazon

Figure 6: Parent category transfer entropy between
28
consecutive lifecycle stages (e.g. H(P2 |P3 )) across the
From Mining to Understanding: The Evolution of Social Web Users
datasets, together with the bounds of the 95% con-
nalisation + q| p + |R(u)| recommendation model as
component of the 1
Model yj
Formulation
rui = bui
ˆ 6.1 Recommendation 2
(19)
u
i
ws: Including Taste Evolution in a Recommender System
Current
j2R(u)
We use the following model for our recommenderWork!
system
X
| factorisation: 1
based upon matrix pu Personalisation component: f latent factors
rui = bui + qi
ˆ
+ |R(u)| 2
yj f
(19)
we have three latent ufactor i:
•  Predict rating for user for item vectors: qi 2 R dej2R(u)
f latent factors associated with the item i; pu 2 Rf
rui = bui with qi user u; and
ˆ
+ p| the
(8)
u
f
he f we have three associated
6.2 Biases
ove, latent factors latent factor vectors: qi 2 R dedenotes biases in user u and item ilatent factoritem i; p for
Bias component of our model
the fThe the factors associated with the asvector u 2 Rf
latent f dimension are defined follows:
m the set ofbias component to include taste evolution signal:
es the f latent factors associated u: R(u). user la- and
•  Modify rated items by user with the The u;
rs fare derived duringStatic
learning,latent shall vector
Evolving
R denotes the fz dimension zas we factorexplain for
}|
{
}|
{
hile the the setui of rated iitems+ bi,cats(i)u:isR(u). pri- la- (9)
j from numberof factors toucapture (f ) bu,cats(i)
b = µ + b + b by user + set a The
is often set to 50 across the
actors are derived categories of item i literature. we shall explain
during learning, as The factors
How global tastes for the
have
6.2.1 number of factorsitems, for instance Ro-ofpri-i
Static Biases the tastes evolvedu have evolved)for categories item
nifying the
, while attributes acrossHow the toof user
capture (f is set a
The bias component inthe model containsThe biases
omedies or Action Films of ourpersonalisation component: We
this is often setcategoriesacrossthe movies domain. factors into 50 within the literature. static
•  Interpolate
duced from the training segment. Therefor each the general
include sequation 19 to incorporate maths to be shown here! instance Rolatent factors for
Too across the items,
re unifying attributes much
bias of the given dataset (µ), item from. Our in29
tegory thatUnderstanding: Thehas Films inwhich is shown in Figure 7 as
rated an
c Comedies a user scoreof across all ratings within the training
From Mining to or Action
Evolution Social Web Users the movies domain. We
the mean rating
hind this inclusion is that certain categories have a
d Equation 19 to incorporate latent factors for each se-
3

Average Rating

2
1

7.0
6.0

Independent Films
Directorial Debut Films

0

Directorial Debut Films
1990s Comedy Films

5.0

Average Rating

4

5

8.0

3.8
3.6
3.4
3.2
3.0

Average Rating

4.0

di↵erent datase
and then selecting the top-2 most frequent. In Figure 3 we
interested in un
plotted the development of the average rating score across
lution di↵ers, a
these two categories, derived using a 7-day moving average
for the platform
to smooth the variance in rating fluctuations. We find that
there are peaks and troughs in the reviewing of the items
5.1 Pream
that belong to the categories, in particular one can note that
From this po
for MovieLens the scores remain relatively stable, while for
ommender syst
Movie Tweetings ‘Independent Films’ reduce in their average
By modelling tasteDebut films’ increase in their average
evolution we can capture… ease of legibilit
rating and ‘Directorial
for set notation
rating over time. Such information can be encoded within
the biases of the recommendation models and
• u, v denot
(i)  the influencebias in light of dynamics consider the
of global when the rating the user
on is being
stability of a given
• i, j denot
(ii)  made: the user’s preferences for of the rating signal
how i.e. considering the fluctuation categories change • r denotes
and how this relates to previous fluctuations.
denotes a
(iii)  how global tastes are evolving
• Datasets
D and are
(Dtest ) da
such that
• c, c0 deno
graph and
itself is d
May
Jul
Sep
Nov
Mar Apr May Jun Jul Aug
1998
2002
2006
2010
notes the
Time
Time
Time
nect conc
(a) Lens
(b) Tweetings
(c) Amazon
rected gra
30
i.e. ec,c0 d
From Mining to Understanding: The Evolution of Social Web Users
the triple
Figure 3: Average ratings derived using a 7-day
American Films
Black and White Films
31

Conclusions

From Mining to Understanding: The Evolution of Social Web Users
p(y|y )

salient di↵erentiating feature.
y2Ys ,

2.5

Stat
m(u, s), where the mea
z
}|
period s. Rates= µ + bi
are defi
bui
lifecycle period to the n

6.2.1

Static Biases

●

●

●

0.4

●

●
●●

●

●

●

●

●

●

●

●●●

3.0

●

●

●●●

●

●

●

●●

●●

●

●

1.5

m

●

●

●●

●

1.0

●

●

●

●

●●

●

●

●●●

●

●

●

●

0.5

0.6

●●

4.0

●

●

●

3.5

0.8

dm
User evolution can be captured using lifecycle models component
The bias (u, s) =
ds

duced from the trainin
bias of the given datas
Where m is indexed
the mean rating score
period cross-entropy), u
segment. The use of th
to return the magnitud
note allotted lifecycle
at thethe variance in ra
dataset therefore we
is formed- for a single u
the user
features: bias (bu ). The
from the mean bias fo
x =[m1 (u, the . . , m
ment, while 2), . latter1 (
mean bias from. . . , tra
m1 (u, 2), the m1

●

●
●● ●
●

●●

●

●

●●

●

●

●

●●
●

●

●

●

●

●

●

µ=3.7
● ●
●

As a result of using b
of the 20 lifecycle perio
for magnitudes and the
provided with at most
18 magnitude features
1
2
3
4
5
rate featuresRating each
for
Average
scribe within the exper
(a) Lens
used between di↵erent:
features, community cro
(ii) lifecycle periods. On
Figure 7: Distribut
the research questions t
three datasets
into a user’s lifecycle c
constraining the feature
eratively increasing the
6.2.2 Category Bia
●

●

●
●
●

1

●
●●

●

2

3

●●

●●

●
●

●

●

●
●

●

4

Lifecycle Stages
0.2 0.4 0.6 0.8

●

5

●

●

●

●
●●

●
●

1

2

●●● ●●
●

3

●

●

●●

●●

●

●

4

Lifecycle Stages
0.2 0.4 0.6 0.8

0
1
0
32
Lifecycle Stages
Lifecycle Stages
(a) Lens
From Mining to Understanding: The Evolution of Social (b) Tweetings
Web Users

5
1

Comumunity Cross Entropy
Transfer Entropy

●

●

●

●

●
●
●

●
●
●
●●

●

●
●

●

●●
●

●

●

●

●

●

1
0

●

2

3

4

Lifecycle Stages
0.2 0.4 0.6 0.8

5
1

(c) Amazon
(g) Lexical - Face- (h) Lexical - SAP (i) Lexical - Server
book
Fault
Lifecycle Stages

● ●
●

10−4

7.0 0.1307.5 0.132 8.0 0.134 8.5 0.136

●

Transfer Entropy
Comumunity Cross Entropy
6.0 0.112
6.5
7.0 0.114
7.5
8.0 0.116

Users’ tastes are susceptible to global taste influence
●

●
●● ●
● ●
● ●● ● ●
●● ● ●
●
● ●●● ● ●
● ●●
● ●●
● ● ●● ●
● ●
●
● ●● ●
● ●●
●● ●● ●●● ●
● ● ●●
● ●
●
●
● ●●●●● ●● ●
●●●●● ● ●
●●●● ● ●
● ●● ● ●
● ● ●●●● ● ●●●●
●●●●● ●●●●
●
● ●●
●●
●● ●●●●● ●●●● ● ●
● ●●●●●●●●● ●●
● ●●●●● ●●● ●
● ●●● ● ●●
●
● ● ●●●●●●●●●●● ● ●● ●
● ● ●●●●●●●● ● ●●
●●●●●●●●●● ●
● ●● ●●●●
● ●●●●●
●●

●

●

●●●●●●●●●●●●●●●●● ●
● ● ●●●●●●●●●●●●●●
● ● ●●●●●●●●●●●●●●
● ● ●●●●●●●●● ●●● ●
● ● ●●●●●●●●●●●
●●●●●

●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●●●●●●●●●●●●●●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●●●●●●●● ●
● ●●●●●●●●●●●●●●●●●●●●●●●
● ●● ●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●● ●

p(x)

●●●

●

●

●●●●

10−2

●

●

●

●

●

●

p(x)

●

●

●

10−3

●

●

Community Cross Entropy

●

3.0 3.5 4.0 4.5 5.0 5.5 6.0

●

●●

●●

●

●

●

●

Community Cross Entropy

3.5
3.0
2.5

●

●

2.0

Community Cross Entropy

●

●

3.0 3.5 4.0 4.5 5.0 5.5 6.0

Churners and non-churners exhibit divergent signals

Transfer Entropy
Comumunity Cross Entropy
0.120 6.4 6.6 6.8 7.0 0.124
0.122
6.0 6.2

3. 

2.0

●

4.5

We derived the transfer entropy between consecutive lifecycle periods, as with the conditional entropy above, to examine how the influence of global and local dynamics on
users’ taste profiles developed over time. Figure 6 plots the
means 0.2 1 0.6 0.8values across3the lifecycle periods n 0.8 1
of 0.4
these 1 2 0 0.2 0.4 0.6 0.8 1 … 0 0.2 0.4 together
0
0.6
Lifecycle Stages
Lifecycle Stages
Lifecycle users of
with the 95% confidence intervals. We find that Stages
MovieLens transfer (b) In-degree
over In-degree
(a) In-degree
- entropy decrease - (c) time, indicating
that global dynamics have a stronger Server Fault users’
influence on
Facebook
SAP
taste profiles towards later lifecycle stages. Such an e↵ect is
characteristic of users becoming more involved and familiar
with the review system, and as a consequence paying attention to more information from the users. With Movie Tweetings and Amazon we find a di↵erent e↵ect: users’ transfer
entropy actually increases over time, indicating that users
are less influenced by global taste preferences, and therefore
0
0.2 0.4 0.6 0.8
1
0
0.2 0.4 0.6 0.8
1
0
0.2 0.4 0.6 0.8
1
the ratings of other users, and instead concentrate Stages their
Lifecycle Stages
Lifecycle Stages
Lifecycle on
own tastes.
(d) Out-degree - (e) Out-degree - (f) Out-degree Facebook
SAP
Server Fault
4.0

2. 

Comumunity Cross Entropy

5.0

1.0

Comumunity Cross Entropy

Churners
Non−churners

●

0.2

1. 

Comumunity Cross Entropy

1.2

y 0 2Ys 1 ,
x2Xs 1
33

Questions?
@mrowebot
m.rowe@lancaster.ac.uk
http://www.lancaster.ac.uk/staff/rowem/

Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community
Platforms. M Rowe. International Conference on Social Informatics. Kyoto, Japan (2013)
Mining User Lifecycles from Online Community Platforms and their Application to Churn
Prediction. Understanding: The Evolution of Social Web Users
From Mining to M Rowe. International Conference on Data Mining. Dallas, US. (2013)

Contenu connexe

Similaire à From Mining to Understanding: The Evolution of Social Web Users

Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Matthew Rowe
 
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEBCOMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEBIJMIT JOURNAL
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016Saurabh Deochake
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processingjins0618
 
Unsupervised Scalable Statistical Method for Identifying Influential Users in...
Unsupervised Scalable Statistical Method for Identifying Influential Users in...Unsupervised Scalable Statistical Method for Identifying Influential Users in...
Unsupervised Scalable Statistical Method for Identifying Influential Users in...Facultad de Informática UCM
 
Marketing analysis
Marketing analysisMarketing analysis
Marketing analysisGaurav Dubey
 
Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013Nicola Barbieri
 
Poster presentation 5th BENet (Belgium Network Research Meting), Namur
Poster presentation 5th BENet (Belgium Network Research Meting), Namur Poster presentation 5th BENet (Belgium Network Research Meting), Namur
Poster presentation 5th BENet (Belgium Network Research Meting), Namur Nanyang Technological University
 
RSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaAlceu Ferraz Costa
 
A survey on temporal cyclic patterns
A survey on temporal cyclic patternsA survey on temporal cyclic patterns
A survey on temporal cyclic patternseSAT Journals
 
Studying user footprints in different online social networks
Studying user footprints in different online social networksStudying user footprints in different online social networks
Studying user footprints in different online social networksIIIT Hyderabad
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEMEMTECHSTUDENTSPROJECTS
 
Graph Based User Interest Modeling in Twitter
Graph Based User Interest Modeling in TwitterGraph Based User Interest Modeling in Twitter
Graph Based User Interest Modeling in Twitterraghavr186
 
Prometheus: User-Controlled P2P Social Data Management for Socially-aware App...
Prometheus: User-Controlled P2P Social Data Management for Socially-aware App...Prometheus: User-Controlled P2P Social Data Management for Socially-aware App...
Prometheus: User-Controlled P2P Social Data Management for Socially-aware App...Nicolas Kourtellis
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMatthew Rowe
 
Feedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesFeedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesPaolo Massa
 

Similaire à From Mining to Understanding: The Evolution of Social Web Users (20)

Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...
 
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEBCOMMUNITY DETECTION IN THE COLLABORATIVE WEB
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
 
ITS for Crowds
ITS for CrowdsITS for Crowds
ITS for Crowds
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
Unsupervised Scalable Statistical Method for Identifying Influential Users in...
Unsupervised Scalable Statistical Method for Identifying Influential Users in...Unsupervised Scalable Statistical Method for Identifying Influential Users in...
Unsupervised Scalable Statistical Method for Identifying Influential Users in...
 
Marketing analysis
Marketing analysisMarketing analysis
Marketing analysis
 
Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013Influence-based Network-oblivious - ICDM 2013
Influence-based Network-oblivious - ICDM 2013
 
Poster presentation 5th BENet (Belgium Network Research Meting), Namur
Poster presentation 5th BENet (Belgium Network Research Meting), Namur Poster presentation 5th BENet (Belgium Network Research Meting), Namur
Poster presentation 5th BENet (Belgium Network Research Meting), Namur
 
Social dynamic behaviour patterns_long
Social dynamic behaviour patterns_longSocial dynamic behaviour patterns_long
Social dynamic behaviour patterns_long
 
RSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social MediaRSC: Mining and Modeling Temporal Activity in Social Media
RSC: Mining and Modeling Temporal Activity in Social Media
 
A survey on temporal cyclic patterns
A survey on temporal cyclic patternsA survey on temporal cyclic patterns
A survey on temporal cyclic patterns
 
Studying user footprints in different online social networks
Studying user footprints in different online social networksStudying user footprints in different online social networks
Studying user footprints in different online social networks
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
Graph Based User Interest Modeling in Twitter
Graph Based User Interest Modeling in TwitterGraph Based User Interest Modeling in Twitter
Graph Based User Interest Modeling in Twitter
 
Prometheus: User-Controlled P2P Social Data Management for Socially-aware App...
Prometheus: User-Controlled P2P Social Data Management for Socially-aware App...Prometheus: User-Controlled P2P Social Data Management for Socially-aware App...
Prometheus: User-Controlled P2P Social Data Management for Socially-aware App...
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online Communities
 
Feedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesFeedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online Communities
 

Plus de Matthew Rowe

From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...Matthew Rowe
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureMatthew Rowe
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Matthew Rowe
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web SystemsMatthew Rowe
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsMatthew Rowe
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research AgendaMatthew Rowe
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social SemanticsMatthew Rowe
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesMatthew Rowe
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsMatthew Rowe
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsMatthew Rowe
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeMatthew Rowe
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebMatthew Rowe
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataMatthew Rowe
 
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesIntegrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesMatthew Rowe
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesMatthew Rowe
 
The Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyThe Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyMatthew Rowe
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataMatthew Rowe
 
Harnessing the Social Web: The Science of Identity Disambiguation
Harnessing the Social Web: The Science of Identity DisambiguationHarnessing the Social Web: The Science of Identity Disambiguation
Harnessing the Social Web: The Science of Identity DisambiguationMatthew Rowe
 

Plus de Matthew Rowe (19)

From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, Future
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web Systems
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositions
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online Communities
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on Youtube
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social Data
 
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesIntegrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
 
The Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyThe Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User Study
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
 
Harnessing the Social Web: The Science of Identity Disambiguation
Harnessing the Social Web: The Science of Identity DisambiguationHarnessing the Social Web: The Science of Identity Disambiguation
Harnessing the Social Web: The Science of Identity Disambiguation
 

Dernier

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

From Mining to Understanding: The Evolution of Social Web Users

  • 1. FROM MINING TO UNDERSTANDING: THE EVOLUTION OF SOCIAL WEB USERS DR. MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | M.ROWE@LANCASTER.AC.UK Faculty of Science and Technology Christmas Conference Lancaster University, UK
  • 2. Our interests develop ‘Offline’ Primary School High School University Time 1 From Mining to Understanding: The Evolution of Social Web Users Postgrad Postdoc Lecturing
  • 3. And so too do our social networks… Offline, we develop in terms of both our interests and social networks Primary School High School University Time 2 From Mining to Understanding: The Evolution of Social Web Users Postgrad Postdoc Lecturing
  • 4. This also happens ‘online’, on the ‘Social Web’… 3 From Mining to Understanding: The Evolution of Social Web Users
  • 5. First, Web 1.0 4 From Mining to Understanding: The Evolution of Social Web Users
  • 6. Then, Web 2.0… the ‘Social Web’ 5 From Mining to Understanding: The Evolution of Social Web Users
  • 7. …to understand how people behave online …to learn how people shape their identities Why study user evolution? …to predict churners (from social networks and online communities) 6 From Mining to Understanding: The Evolution of Social Web Users …to build better recommender systems
  • 8. Talk Outline User Lifecycles, Properties & Evolution Measures Predicting Churners 7 From Mining to Understanding: The Evolution of Social Web Users Recommending Items Conclusions
  • 9. 8 User Lifecycles From Mining to Understanding: The Evolution of Social Web Users
  • 10. Modelling User Evolution: Lifecycles Offline Lifecycle Periods Primary School High School University Postgrad Postdoc Lecturing Time First Action Last Action Lifecycle Periods of a potential Question-Answering System user (conjecture!) Novice Users Asking Questions Asking & Answering Questions Answering Questions In reality: do not know the labels, however we can split by equal time intervals: 1 2 3 … n Yet, users non-uniformly distribute their activity across lifecycles 1 2 3 9 From Mining to Understanding: The Evolution of Social Web Users … n
  • 11. User Properties in Lifecycle Stages 1 2 1 #actions 3 2 = … n We divide lifetime into equal activity periods #actions Model the actions to user u by other users Model the actions by user u to other users Term s Count Model the tastes of the user 10 From Mining to Understanding: The Evolution of Social Web Users 17 Web 5 Item Mining Model the terms used by user u Semantic 4 Rating Alien Statistics 3 4* Bladerunner 5* Star Wars 4*
  • 12. How can we track the evolution of user’s properties? Solution: use measures from information theory 11 From Mining to Understanding: The Evolution of Social Web Users
  • 13. by computing the cross-entropy of one probability distribution with respect to another distribution from an lifecycle period, and the properties differ between time steps? How do then selecting the distribution that minimises cross-entropy. Assuming we have a probability distribution Decrease = similarity between properties (P ) formed from a given lifecycle period ([t, t0 ]), and a probability distribution (Q) from an earlier lifecycle period, then we define the cross-entropy between the distributions as follows: Evolution measure 1: Cross-Entropy X H(P, Q) = p(x) log q(x) (5) x In properties in vein User the same period sas the earlier entropy analysis, we derived the period cross-entropy for each platform’s users User Properties in period s-1 throughout their lifecycles and then derived the mean crossentropy for the 20 lifecycle periods. Figure 2 presents the 12 cross-entropies The Evolution of Social Webthe different platforms and user derived for Users From Mining to Understanding: properties. We observe that for each distribution and each
  • 14. By using conditional entropy we can assess the information needed to describe the taste profile of a user at one time How much information is transferred previous period step (Q) using his taste profile from the from one stage (P ). to entropy A reduction in conditionalthe next?indicates that the user’s taste profile is similar to information is transferred Decrease = more that of his previous stage’s profile, while an increase indicates the converse. We define the conditional entropy of two discrete probability distributions, representing taste profiles, as: Conditional Entropy Evolution measure 2: X p(x) H(Q|P ) = p(x, y) log (5) p(x, y) x2P, y2Q We derived the conditional entropy over the 5 lifecycle User properties in period s periods in a pairwise fashion, i.e. H(P2 |P1 ), . . . , H(P5 |P4 ), and User Properties in periodof the mean conditional entropy in plotted the curve s-1 Figure 5 over each dataset’s users in the training split, also including the 95% confidence intervals to show the varia13 From tionMining tothe conditionalSocial Web Users in Understanding: The Evolution of entropies. Figure 5 indicates that
  • 15. examine the information transfer from a prior lifecycle stage (s 1) to the current lifecycle stage (s) of the user. Now, assume that we have a random variable thatthe user’s the local How do global dynamics influence describe categories that have been reviewed at the current stage (Ys ), properties? a random variable of local categoriesglobal influence stage Decrease = more susceptible to at the previous (Ys 1 ). and a third random variable of global categories at Increase = less susceptible to global influence the previous stage (Xs 1 ), we then define the transfer entropy of one lifecycle stage to another as follows, based on the work of Schreiber measure 3: Transfer Entropy Evolution [8]: TX!Y = H(Ys |Ys 1) H(Ys |Ys 1 , Xs 1 ) (6) Using the above probability distributions we can calculate the transfer entropy based on the joint and conditional probSurprise in user properties from s-1 to s ability distributions given the values of the random variables Surprise in user properties in s when we consider all users’ properties from s-1 14 From Mining to Understanding: The Evolution of Social Web Users
  • 16. 15 Predicting Churners via Evolution Signals ...from Online Communities From Mining to Understanding: The Evolution of Social Web Users
  • 17. d testing, using the former in this section to examine user development e latter split forOnline Communities experiments. Datasets: our later detection Platform Time Span Post Count User Count Facebook [18-08-2007,24-01-2013] 118,432 4,745 SAP [15-12-2003,20-07-2011] 427,221 32,926 Server Fault [01-08-2008,31-03-2011] 234,790 33,285 Churner ‘Cutoff’’ Defining Lifecycle Periods For th 1500 800 1000 1 Table 1. Statistics of the online community platform datasets. 1000 500 Posts Frequency 1000 2008 2010 Time 2012 0 0 200 600 Posts Frequency 600 400 200 0 Posts Frequency order to examine how users develop over time we needed some Fault mean gment a user’s lifetime (i.e. from the first date at which they post to thet rate simila their final post) into discrete intervals. Prior work [6, 2, 5] has demonstr the cr e extent to which users develop at their own pace and thus evolve accor must s their own ‘personal clock ’ [5]. Hence, for deriving the lifecycle periods ofis u fect thin the platforms we adopted an activity-slicing approach that divid non-ch (a) Facebook (b) SAP (c) Server Fault comm er’s lifetime into 20 discrete time intervals, emulating the approach in [2], 16 th an equal proportion of activity within each period. This approach than c funct From Mining to Understanding: The Evolution of Social Web Users distrib follows: we derive the Posts per-day for the ({[ti , tj ]} with ) by first deri Figure 2: set of interval tuples datasets 2 T the to foll 2004 2006 2008 Time 2010 2009 2010 Time 2011
  • 18. 0.8 0 0.2 0.4 ● ●●● ● ● 0.6 0.8 0.04 0.03 0.02 0.01 ● 0.10 0.15 - (b) In-degree SAP 0.05 ● 17 ● ● From Mining to Understanding: The Evolution of Social Web Users 0.10 ●●● 1 ● 0 ●● ● ●● ●●●●●●●● 0.2 Lifecycle Stages −period Cross Entropy 0.20 0.15 ● ● 0.00 0.20 0.15 0.10 0.05 1 Lifecycle Stages (a) In-degree Facebook ● 0.4 ● ●●● 0.6 0.8 ●●● 1 Lifecycle Stages - (c) In-degree Server Fault 0.06 0.6 ●● ● 0.04 0.4 ●● ● ● ● 0.02 0.2 ●●●●●●● ● ● ●● Time−period Cross Entropy ●● ●● ● −period Cross Entropy ● ● 0.00 ● Time−period Cross Entropy 0.00 0.02 0.04 0.06 0.08 0.10 Churners Non−churners ● 0 .05 h sn To s’ ss bm at Time−period Cross Entropy n −period Cross Entropy e = than churners. For the cross-entropy of users’ lexical term distributions dissimilarity with prior in-degree non-churners Cross-Entropy: we find the signals of churner andinformation to follow a similar curvature user differ from before? I.e. how do users who contact a given(converging on a limit with a decaying rate) but with di↵erent magnitudes. ● ● -
  • 19. ●●● ● ●● ●● ● ● ● ● ● 0.5 3.0 ●●● ● ● ● Co Co 0.2 Co ● ● ● Cross-Entropy: dissimilarity with community out-degree information (a) In-degree - (b) In-degree - (c) In-degree 0 0.2 0.4 0.6 0.8 1 0 0.2 Lifecycle Stages 0.4 0.6 0.8 1 0 0.2 Lifecycle Stages 0.4 0.6 0.8 1 Lifecycle Stages - 0.4 0.6 0.8 1 ●●● 0 0.2 Lifecycle Stages 0.4 0.6 ● ● ●● ● ● 0.8 0 ● ● ● ● ● 0.2 0.4 0.6 0.8 1 - 8.5 7.0 ●●● ● ● ●● ●●● ●●●●● ● ● ● ● umunity Cross Entropy 8.0 ● ●● ● 6.5 ● ● ● - (f) Out-degree Server Fault 7.5 umunity Cross Entropy 7.0 6.8 6.6 6.4 ● ● ●● ● ● ●●●● ● ● Lifecycle Stages - (e) Out-degree SAP 18 From Mining to Understanding: The Evolution of Social Web Users ● .2 1 ● ● Lifecycle Stages (d) Out-degree Facebook umunity Cross Entropy ● ●● 3.0 3.5 4.0 4.5 5.0 5.5 6.0 ● ● ●●● ● ● ● ● ● 8.0 0.2 ● ● ● ● ● ● 7.5 0 ● ● ●● ●● ● ● ● ● Community Cross Entropy ● ● ● ● ● 3.0 3.5 4.0 4.5 5.0 5.5 6.0 2.5 ● Community Cross Entropy 3.5 3.0 ● 2.0 Community Cross Entropy 4.0 I.e.Facebook users that a user contacted differ from the Fault how do the SAP Server community? ● ● pe to at is fea ●● ● ● ● ● ● ● ● ● ●● ● of fo pr 18 ra sc us fea (ii
  • 20. ●● ● 0.8 2.0 1.5 m(u, s + 1) m(u, s) dm = the standard lineards model: f (x; w) m(u,x. We include the = w| s) m (u, s) = 2. Build the prediction model L2 -regulariser within the model to control for overfitting on Where training splits and test di↵erent measure models. In the m is indexed by the given -indexed (i.e. in-deg •  Define the objective function using vectorabove goal is to minimise period learning the model’s weight the w, our magnitude funct cross-entropy), the minimising the a given measure (m) vector: to return the magnitude ofwith respect •  Learn the model by cost function (C(w))objective: to the weight!for use ●● ● ● ● ●● ● ● ●●● ● ● ● ● 1 0 s 0.2 0.4 0.6 0.8 1 Lifecycle Stages ●● ● Where the latter term (kwk ) defines the L2this 3. Apply the model Goal: learn theby reducinge↵ect on the w regularizer’s -regularizer and x =[m1 (u,defines .the weight of m2 (u, 2), . . . , m2 (u, 19), . . 2), . . , m1 (u, 19), •  Over ‘held-out’ data and thus controls for overfitting on the training split: model, m1 (u, 2), . . . , m1 (u, 18), m2 (u, 1), . . . , m2 (u, 18)] |w| ⇣X ⌘ •  Evaluate performance: how accurate is our 2predictor? 2 2 ● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● 19 From Mining to Understanding: The Evolution of Social Web Users ● 0.8 3.0 3.5 4.0 4.5 5.0 5.5 6.0 ● Community Cross Entropy at the allotted lifecycle period. Thus a feature vector - (c) In-degree X 1 2 (f (xi ; thesei 2 + and (6) Server Fault of the model formedC(w) a single user using w) y )rate kwk2 magnitu is for = 2|Dtrain | Error i=1 features: ● s ● ● 0.5 ● •  Change in the magnitude into from period s to s+1 ● 1.0 ● ● Comumunity Cross Entropy 2.5 feature definition and model specification, we alter the l lexical term distributions cycle period notation from the existing interval tuple set ( h signalPredicting Churners [t, t0 ] 2 T ) to use a set of discrete single elements: s 2 we see a growing ds for both churners and where S = {1, 2, . . . , 20}. Magnitude features are defin es of the curves are the as a given user’s measure taken at a given lifecycle peri 1. Extract Featuresm(u, s),Users’ Evolution cross-entropy curves at lifecy from where the measure for user u is taken •  Magnitude period s. Rates@ period s changes in measures from o of the signal are defined as lifecycle period to the next: 1 ● 0 0.2 0.4 0.6 0.8 Lifecycle Stages 1 kwk2 = j=0 |wj | 1 2 (7) As a result of using both rates and magnitudes from ea For learning the parameter weight vector (w) we use graof the 20 lifecycle periods, aside from the first and last o
  • 21. Evaluation: Results Higher = better Area Under the receiver operator characteristic Curve (AUC) scores for the di↵erent regu Min = 0, Max = 1! on models and the J48 baseline art baseline =State of the model from the state of the art (denoted by J48 ). Best mo is in bold and significance of improvement over the random model baseline is indicated. Platform Facebook J48 = 0.586 SAP J48 = 0.759 Server Fault J48 = 0.796 Feature Set =0 =1 =2 =5 In-degree 0.535. 0.543. 0.538. 0.556* Out-degree 0.674*** 0.666*** 0.676*** 0.696*** Lexical 0.633*** 0.630*** 0639*** 0.637*** Cross-period 0.649*** 0.642*** 0.649*** 0.652*** Cross-community 0.684*** 0.693*** 0.691*** 0.699*** All 0.811*** 0.804*** 0.816*** 0.817*** In-degree 0.652*** 0.651*** 0.651*** 0.652*** Out-degree 0.741*** 0.742*** 0.742*** 0.742*** Lexical 0.501 0.501 0.501 0.499 Cross-period 0.614*** 0.614*** 0.614*** 0.613*** Cross-community 0.765*** 0.765*** 0.765*** 0.765*** All 0.816*** 0.817*** 0.817*** 0.817*** In-degree 0.659*** 0.658*** 0.662*** 0.663*** Out-degree 0.618*** 0.617*** 0.616*** 0.619*** Lexical 0.680*** 0.682*** 0.687*** 0.686*** Cross-period 0.671*** 0.675*** 0.680*** 0.691*** Cross-community 0.778*** 0.779*** 0.780*** 0.778*** All 0.858*** 0.860*** 0.861*** 0.861*** Significance codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . 1 = 10 0.549** 0.690*** 0.641*** 0.651*** 0.701*** 0.819*** 0.654*** 0.743*** 0.497 0.612*** 0.765*** 0.818*** 0.663*** 0.626*** 0.684*** 0.689*** 0.779*** 0.860*** ods to late lifecycle periods. Across all three platOur churn prediction approach makes use of the 20 find that performance improves as additional inment signals that users exhibit along both social an From Mining to Understanding: The Evolution of Social Web Users is added into the models. There are di↵erences, dimensions in order to di↵erentiate between who w in the gradient in performance between the platand who will remain within the online community p
  • 22. By mining users’ evolution signals we can accurately predict who will churn, and who will not… …this enables the early application of retention strategies 21 From Mining to Understanding: The Evolution of Social Web Users
  • 23. 22 Recommending Items from Taste Evolution From Mining to Understanding: The Evolution of Social Web Users
  • 24.
  • 25. Recommender Systems aim to either: (i)  Predict item adoptions (ii)  Predict item ratings
  • 26. duced from the training segment. There include the general number ofthe given dataset (µ), which is shown in have beenas he bias of items within a particular category that Figure 7 reviewed, we instead include the ratings withincalculating er Recommendation Datasets: Item-Ratings when the training the mean rating score across all ratings Table 1: Statistics review define used sets, the former (D u,s,c ) the distribution. use of the mean onfor our analysis and experiments - i.e. of segment. The We first datasets two its own is insu cient train Scale Dataset #Users #Items #Ratings Time Span Ratings corresponding to the in ratings scores for[26-04-2000,31-12-2000] items by during interval s for Movies MovieLensthe variance ratings3,678 u 902,585 6,024 [1,5] ng note the Amazon Movie u,s from Tweetings& therefore 889,173 also include the corresponding to category Reviews 19,043 latter 7,880,387 [20-08-1997,25-10-2012] i ) [1,10] c, and we 253,059 (Dtrain[28-02-2013,23-09-2013] the 11,451 117,206 ) item bias (b and s’ dataset Amazon Movies- TV [0,5] u,s,c u,s Total ratings by u during s,The former 8,900,178Dthe average sets are hence Dtrain ✓ is the user bias (bu ). 914,240 268,188 bias is train , these deviation User …with score r… ar formed as follows:u… for the item i within the training segfrom the mean bias n-coverage. ment, while the latter bias is thethe statistics ofdeviation frominthe 2 demonaverage these datasets shown Table u,s,c Dtrain = , t to s, c 2 (i)} strates the extent 2 which the items, users and ratings have t-4.2 Amazon Movies{(u, i, r, t) : (u, i, r, t) 2 Dtrainratingspresents the distribution of reviews mean bias from the training segment’s Figure 2 by user u. been reduced. (1) For the Amazon Movie and TV Reviews dataset we per users within each of the reduced datasets; we note that …atstrategy t MovieLens (concentrating on users time of erprovided with Amazon Standard Identification Numberswere …rated item i… (ASINs) the collection u,s looked up the ASINs for each item who µ=7.7 reviewed more than 20 skews rsas identifiers of items.train µ=3.7 {(u, i, r, t) : (u, i, r, t) have Dtrain , t 2 s} items)(2) the distribuD We the = Product Advertising 2 users who have produced many reviews, while µ=4.1 in the dataset by querying Amazon tion towards reAPI and returning the item information including: title, acfor Movie Tweetings and the Amazon datatsets we see heavy We then MovieLens directors. Unlike define the function Tweetings, tailed distributions. derive the Table also indicates irtors, andnot provided within the year and Movieinformation ave rating toboth users2 and itemsav- that there is a we were release overall, erage rating valueof from all ofratinglarge reduction innumber of ratings given great, this sugquadruples in the is not as however the from the API, therefore to perform the disambiguation sereduction in the manticset: we used the actor information from each movie: URIs gests two things: (i) mapped items are popular, and thus our intuition being that each movie would have a unique set dominate the ratings; and (iii) obscure items are present X of actors starring in it. Therefore we stored the actors associwithin the data. In particular for the Amazon dataset , deu,s information 1 spite our alignment covering only 10.6% of items we only ated within each item as additional background ) = ave2 rating(D5train 2 3 5 6have8 reduction of 126.9% of r3 total(3)5 suggesting 1 3 4 2 4 and performed disambiguation in a similar vein as1 above: 4 u,s 7 | a 9 the ratings, |Dtrain we cover the ‘headLifetime (in ratings user Average Rating Average Rating u,s the days) per distribution in terms we first identified candidate URIs for a given movie item that ’ of 100 ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●●●●●●● ●●●● ● ● ●● ●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●●● ●●●● ● ● ● ● ●●●●●●●● ● ● ● ● ● ●●● ●●●● ● ●●● ● ● ● ● ● ● ● ● ●●●●●●●●● ● ● ●●● ●● ● ● ● ●●●●●●●●●●● ●● ● ●● ●●●● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●●●●● ●●●●●●● ●●●● ● ● ● ● ● ●●●●● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ●●●●●● ●●● ●●●● ● ● ●●●●●●●●●●●● ●●●● ● ●●●●●●●● ●●● ● ●● ● ●●●● ● ● ● ●● ● ●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●●● ●●●●●●●●●● ● ●● ●● ●●●●●●●●●●●● ● ● ● ●●●●● ●●● ● ● 10−4 ● ● ● p(x) 10 ● ● ● −6 10−2 10−1 100 10 ● −8 ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●●●●●●● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ●● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ●●●● ●●●● ● ●● ● ● ● ● ● ●●● ●●●● ●●●●● ● ● ● ● ● ● ● ●● ●● ●● ●●●●● ●●● ●● ● ● ● ●● ●● ● ●● ● ● ●● ● ●●●●● ● ●●●●●●●● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ●●●● ● ●● ●●●● ● ● ● ● ●● ● ● ●●●● ●● ● ● ● ● ●● ●●●● ●●●● ●●● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ●●●●● ● ●●●●●●●●●●●●●●● ● ● ●●● ●● ●● ● ● ● ●●● ●● ● ●●●●●● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●● ● ● ● ●● ● ● ● ●●●● ●●●● ● ● ●● ●●●●●●●● ●●●●●●●●●●● ● ● ● ● ● ●●● ●● ● ●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●● ● ●● ●●●●● ●●●●●●●●●●●●●●●●●● ● ●●● ● ●● ●● ●●● ●●●●●●●●● ●●●●●●●●● ● ●●● ● ● ●●●● ● ●●● ●●●●●●●●● ●● ●●●● ●●●●●● ● ● ●●●● ●● ●●●●●●●● ● ●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●● ●● ●● ●●●●● ●●● ● ● ●●●●● ●●● ● ● ● ● ● ●● ●●●●●● ●●●●●● ●●●●●●●●●● ● ●● ● ● ● ●● ●●●●●●●●●● ● ●● ● ●● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●●●●●● ● ●●●●●●●●●●●●●●●●● ● ● ●● ●● ●●● ●●●●●●●● ●●●●● ●● ●● ●●●● ● ●●●●●●●●●● ● ●● ●●● ●● ●●●●●●●●●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●● ● ●● ●●● ●● ●●●●● ●●●●●●●●●●●●●●●●● ● ● ●● ●● ●●●●●● ●●●●●●●●●●●●●● ● ●● ● ● ●●●●●●●●●●●● ●●● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●●● ●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●● ●● ●● ●● ●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●● ●● ● ●●●●●●●●●●●●●●●●●●● ● ● ●● ●●● ● ●●● ●●● ●●●●●●●●●● ● ● ● ● ● ● ● ●●●●●●● ● ●● ●● ● ● ●●● ● ● ● ●● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●● ●● ●●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●● 10 10 −4 ● ● ● ●●●●●●●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●● ● ● ● ●●●●●●●●● ●● ●●●● ● 10 ● ● 10 ● p(x) ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ●● ●●● ● ● ● ●● ● ● ● ● ● ●●●●● ●● ● ●●●●● ● ● ●●●● ● ● ● ●● ● ● ● ● ●●●● ● ●●●● ●●●●● ●●●● ● ● ●● ●● ●● ●●●●●●●●● ●●● ● ●●●●●● ●●● ● ● ●●●● ● ●●● ● ● ●●● ● ●● ● ● ● ●●●●●●●●●●● ● ●● ● ● ● ●●●●●●●● ● ●● ●●●●●●●●●● ● ● ●● ●●●● ● ●●●●● ●● 10−3 p(x) ● −3 2 −4 ● ● ● −5 10−2 ● (u,i,r,t)2Dtrain by performing fuzzy matches between the item title and seof popularity. 25 titles. We then derived the correct URI by mantic URIs’ From Mining to actors associated with the Social Web comparing the set of Understanding: The Evolution of item (Aa ) Users and the set of actors associated within each candidate URI 100 ● ● ● ●● ●● µ=5.8 −4 µ=139.7 10−1 100 ● ●● ● ● ● ●● ●●● ●● ● ●●● ●● ●●● ●● ●● ● ●● ● ● ●● ● ●● 2 10−2 (a) Lens (b) Tweetings (c) Amazon From these definitions we then derive the discrete probability distribution of the user’s ratings per category as fol- µ=12.5 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
  • 27. 3 2 1 Average Rating 7.0 6.0 Independent Films Directorial Debut Films 0 Directorial Debut Films 1990s Comedy Films 5.0 Average Rating 4 5 8.0 3.8 3.6 3.4 3.2 3.0 Average Rating 4.0 the biases of the recommendation models and consider the movie ‘Alien’u, v denote u then information returned. For instance, for the • restability of a given bias in in 1970, which we shall now use as a running example, denote it item leased light of when the rating is being • i, j W made: i.e. considering the the following categories are rating Forming Taste Profiles fluctuation of the found: signal • r denotes a k the and how this relates <h tpreviouspfluctuations. o u r c e / A l i e n ( f i l m )> to t p : / / d b e d i a . o r g / r e s denotes a pre d c t e r m s : s u b j e c t c a t e g o r y : A l i e n ( f r a n c h i s e ) •i lDatasets base f ms ; are port dcterms : s u b j e c t c a t e g o r y :1979 h o r r o r f i l m s ; dcterms : s u b j e c t c a t e g o r y : S p a c e a d v e n t u r e f i l mD ; s and are seg are a d c t e r m s : s u b j e c t c a t e g o r y : F i l m s s e t i n t h e f u t u r etest ) datas . (D the t such that Dt 71% Subject categories form a hierarchical structure such that0 • c, c ing c parent categories define more general subjects. For instance denote t from graph and C the category category:Films_set_in_the_future is linked twee itself to category:Science_fiction_films_by_genre by the pred- is deno May Jul Sep Nov Mar Apr May Jun Jul Aug 1998 2002 2006 2010 cove notes Time Time Time icate skos:broader, thus providing a general taxonomic clas- the set ings sification of the film. The advantage of such a structure is concept nect (a) Lens (b) Tweetings (c) Amazon and that we can explicitly identify a given user’s tastes at a given rected graph point in time via the categories of films that they have con- e 0 from i.e. deno Item sumed, and thus rated. In order to provide such information, c,c betw Rating Item Rating s the triple c s Figure 3: Average ratings require a link between a given item within and thusMov Alien however, we derived using a 7-day 4* Space_adventure (4+4)/2 = one 4 mo It of our top-2 datasets frequently rated three most and the semantic URI that denotes moving average of the Bladerunner 5* Science Fiction (4+5+4)/3 = 4.3 denotes a the that movie item. However in deriving semantic web•URIs categories. from mantic categ Star Wars films we may encounter ambiguity issues where multiple 4* for the films share the same title - this often happens with u,s,c tion of semaI film reProbability of user rating category we use available information from traintion either l the ave ) u,s makes. Therefore c P r(c|Dtrain ) = X rating(D each of0 (4) m u,s,c Mov high our datasets to disambiguate the semantic ave rating(Dgories: p : I in lifecycle period s: URIs and thus ) train 5. ANALYSING TASTE EVOLUTION this u,s Twe 26 return the correct alignment. In c0 2Ctrain section we describe connected gr view FromAnalysing the evolution and development of users’ tastes Mining to Understanding: The this disambiguation procedure across the three datasets usEvolution of Social Web Users Based on this formalisation we can3 assess the relative allows one to understand how a given rating is likely categoryyeara of theuser and lifecycle I.e. Using not ing two methods: one based on title and for given movie the sta mean user score per to rate American Films Black and White Films
  • 28. 0.220 0.290 ● 1 2 3 4 Lifecycle Stages (a) Lens 5 ● ● 0.205 0.275 0.225 ● 0.215 0.285 ● ● ● 0.210 ● ● Conditional Entropy ● ● 0.280 Conditional Entropy 0.245 ● 0.235 Conditional Entropy rate items in the future given their category information. Conversely, for MovieLens and Movie Tweetings we see an Conditional-Entropy: relative profiles become less opposite e↵ect: users’ taste information differencepredictable I.e. how dissimilar is the user’s ratings in period s from period s-1? as they develop; users rate items in a way that renders uncertainty in profiling from previous information. 1 2 3 4 Lifecycle Stages (b) Tweetings 5 1 2 3 4 5 Lifecycle Stages (c) Amazon Figure 5: Parent category conditional entropy be27 tweento Understanding: The Evolution oflifecycle stages (e.g. H(P2 |P3 )) consecutive Social Web Users From Mining across the datasets, together with the bounds of the
  • 29. 2 3 4 Lifecycle Stages (a) Lens 5 0.136 2 3 ● 4 Lifecycle Stages (b) Tweetings 5 0.134 ● 0.132 0.114 1 ● ● 0.130 1 ● ● Transfer Entropy 0.116 ● ● 0.112 ● Transfer Entropy 0.122 ● 0.120 Transfer Entropy 0.124 ings and Amazon we find a di↵erent e↵ect: users’ transfer entropy actually increases over time, indicating that users Transfer-Entropy: influence of globalpreferences, and therefore are less influenced by global taste behaviour on the user I.e. how does collective user behaviour influence the user’s tastes? their the ratings of other users, and instead concentrate on own tastes. 1 ● 2 3 ● 4 5 Lifecycle Stages (c) Amazon Figure 6: Parent category transfer entropy between 28 consecutive lifecycle stages (e.g. H(P2 |P3 )) across the From Mining to Understanding: The Evolution of Social Web Users datasets, together with the bounds of the 95% con-
  • 30. nalisation + q| p + |R(u)| recommendation model as component of the 1 Model yj Formulation rui = bui ˆ 6.1 Recommendation 2 (19) u i ws: Including Taste Evolution in a Recommender System Current j2R(u) We use the following model for our recommenderWork! system X | factorisation: 1 based upon matrix pu Personalisation component: f latent factors rui = bui + qi ˆ + |R(u)| 2 yj f (19) we have three latent ufactor i: •  Predict rating for user for item vectors: qi 2 R dej2R(u) f latent factors associated with the item i; pu 2 Rf rui = bui with qi user u; and ˆ + p| the (8) u f he f we have three associated 6.2 Biases ove, latent factors latent factor vectors: qi 2 R dedenotes biases in user u and item ilatent factoritem i; p for Bias component of our model the fThe the factors associated with the asvector u 2 Rf latent f dimension are defined follows: m the set ofbias component to include taste evolution signal: es the f latent factors associated u: R(u). user la- and •  Modify rated items by user with the The u; rs fare derived duringStatic learning,latent shall vector Evolving R denotes the fz dimension zas we factorexplain for }| { }| { hile the the setui of rated iitems+ bi,cats(i)u:isR(u). pri- la- (9) j from numberof factors toucapture (f ) bu,cats(i) b = µ + b + b by user + set a The is often set to 50 across the actors are derived categories of item i literature. we shall explain during learning, as The factors How global tastes for the have 6.2.1 number of factorsitems, for instance Ro-ofpri-i Static Biases the tastes evolvedu have evolved)for categories item nifying the , while attributes acrossHow the toof user capture (f is set a The bias component inthe model containsThe biases omedies or Action Films of ourpersonalisation component: We this is often setcategoriesacrossthe movies domain. factors into 50 within the literature. static •  Interpolate duced from the training segment. Therefor each the general include sequation 19 to incorporate maths to be shown here! instance Rolatent factors for Too across the items, re unifying attributes much bias of the given dataset (µ), item from. Our in29 tegory thatUnderstanding: Thehas Films inwhich is shown in Figure 7 as rated an c Comedies a user scoreof across all ratings within the training From Mining to or Action Evolution Social Web Users the movies domain. We the mean rating hind this inclusion is that certain categories have a d Equation 19 to incorporate latent factors for each se-
  • 31. 3 Average Rating 2 1 7.0 6.0 Independent Films Directorial Debut Films 0 Directorial Debut Films 1990s Comedy Films 5.0 Average Rating 4 5 8.0 3.8 3.6 3.4 3.2 3.0 Average Rating 4.0 di↵erent datase and then selecting the top-2 most frequent. In Figure 3 we interested in un plotted the development of the average rating score across lution di↵ers, a these two categories, derived using a 7-day moving average for the platform to smooth the variance in rating fluctuations. We find that there are peaks and troughs in the reviewing of the items 5.1 Pream that belong to the categories, in particular one can note that From this po for MovieLens the scores remain relatively stable, while for ommender syst Movie Tweetings ‘Independent Films’ reduce in their average By modelling tasteDebut films’ increase in their average evolution we can capture… ease of legibilit rating and ‘Directorial for set notation rating over time. Such information can be encoded within the biases of the recommendation models and • u, v denot (i)  the influencebias in light of dynamics consider the of global when the rating the user on is being stability of a given • i, j denot (ii)  made: the user’s preferences for of the rating signal how i.e. considering the fluctuation categories change • r denotes and how this relates to previous fluctuations. denotes a (iii)  how global tastes are evolving • Datasets D and are (Dtest ) da such that • c, c0 deno graph and itself is d May Jul Sep Nov Mar Apr May Jun Jul Aug 1998 2002 2006 2010 notes the Time Time Time nect conc (a) Lens (b) Tweetings (c) Amazon rected gra 30 i.e. ec,c0 d From Mining to Understanding: The Evolution of Social Web Users the triple Figure 3: Average ratings derived using a 7-day American Films Black and White Films
  • 32. 31 Conclusions From Mining to Understanding: The Evolution of Social Web Users
  • 33. p(y|y ) salient di↵erentiating feature. y2Ys , 2.5 Stat m(u, s), where the mea z }| period s. Rates= µ + bi are defi bui lifecycle period to the n 6.2.1 Static Biases ● ● ● 0.4 ● ● ●● ● ● ● ● ● ● ● ●●● 3.0 ● ● ●●● ● ● ● ●● ●● ● ● 1.5 m ● ● ●● ● 1.0 ● ● ● ● ●● ● ● ●●● ● ● ● ● 0.5 0.6 ●● 4.0 ● ● ● 3.5 0.8 dm User evolution can be captured using lifecycle models component The bias (u, s) = ds duced from the trainin bias of the given datas Where m is indexed the mean rating score period cross-entropy), u segment. The use of th to return the magnitud note allotted lifecycle at thethe variance in ra dataset therefore we is formed- for a single u the user features: bias (bu ). The from the mean bias fo x =[m1 (u, the . . , m ment, while 2), . latter1 ( mean bias from. . . , tra m1 (u, 2), the m1 ● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● µ=3.7 ● ● ● As a result of using b of the 20 lifecycle perio for magnitudes and the provided with at most 18 magnitude features 1 2 3 4 5 rate featuresRating each for Average scribe within the exper (a) Lens used between di↵erent: features, community cro (ii) lifecycle periods. On Figure 7: Distribut the research questions t three datasets into a user’s lifecycle c constraining the feature eratively increasing the 6.2.2 Category Bia ● ● ● ● ● 1 ● ●● ● 2 3 ●● ●● ● ● ● ● ● ● ● 4 Lifecycle Stages 0.2 0.4 0.6 0.8 ● 5 ● ● ● ● ●● ● ● 1 2 ●●● ●● ● 3 ● ● ●● ●● ● ● 4 Lifecycle Stages 0.2 0.4 0.6 0.8 0 1 0 32 Lifecycle Stages Lifecycle Stages (a) Lens From Mining to Understanding: The Evolution of Social (b) Tweetings Web Users 5 1 Comumunity Cross Entropy Transfer Entropy ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● 1 0 ● 2 3 4 Lifecycle Stages 0.2 0.4 0.6 0.8 5 1 (c) Amazon (g) Lexical - Face- (h) Lexical - SAP (i) Lexical - Server book Fault Lifecycle Stages ● ● ● 10−4 7.0 0.1307.5 0.132 8.0 0.134 8.5 0.136 ● Transfer Entropy Comumunity Cross Entropy 6.0 0.112 6.5 7.0 0.114 7.5 8.0 0.116 Users’ tastes are susceptible to global taste influence ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ●● ●●● ● ● ● ●● ● ● ● ● ● ●●●●● ●● ● ●●●●● ● ● ●●●● ● ● ● ●● ● ● ● ● ●●●● ● ●●●● ●●●●● ●●●● ● ● ●● ●● ●● ●●●●● ●●●● ● ● ● ●●●●●●●●● ●● ● ●●●●● ●●● ● ● ●●● ● ●● ● ● ● ●●●●●●●●●●● ● ●● ● ● ● ●●●●●●●● ● ●● ●●●●●●●●●● ● ● ●● ●●●● ● ●●●●● ●● ● ● ●●●●●●●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●● ● ● ●●●●●●●●● ●●● ● ● ● ●●●●●●●●●●● ●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ● p(x) ●●● ● ● ●●●● 10−2 ● ● ● ● ● ● p(x) ● ● ● 10−3 ● ● Community Cross Entropy ● 3.0 3.5 4.0 4.5 5.0 5.5 6.0 ● ●● ●● ● ● ● ● Community Cross Entropy 3.5 3.0 2.5 ● ● 2.0 Community Cross Entropy ● ● 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Churners and non-churners exhibit divergent signals Transfer Entropy Comumunity Cross Entropy 0.120 6.4 6.6 6.8 7.0 0.124 0.122 6.0 6.2 3.  2.0 ● 4.5 We derived the transfer entropy between consecutive lifecycle periods, as with the conditional entropy above, to examine how the influence of global and local dynamics on users’ taste profiles developed over time. Figure 6 plots the means 0.2 1 0.6 0.8values across3the lifecycle periods n 0.8 1 of 0.4 these 1 2 0 0.2 0.4 0.6 0.8 1 … 0 0.2 0.4 together 0 0.6 Lifecycle Stages Lifecycle Stages Lifecycle users of with the 95% confidence intervals. We find that Stages MovieLens transfer (b) In-degree over In-degree (a) In-degree - entropy decrease - (c) time, indicating that global dynamics have a stronger Server Fault users’ influence on Facebook SAP taste profiles towards later lifecycle stages. Such an e↵ect is characteristic of users becoming more involved and familiar with the review system, and as a consequence paying attention to more information from the users. With Movie Tweetings and Amazon we find a di↵erent e↵ect: users’ transfer entropy actually increases over time, indicating that users are less influenced by global taste preferences, and therefore 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 the ratings of other users, and instead concentrate Stages their Lifecycle Stages Lifecycle Stages Lifecycle on own tastes. (d) Out-degree - (e) Out-degree - (f) Out-degree Facebook SAP Server Fault 4.0 2.  Comumunity Cross Entropy 5.0 1.0 Comumunity Cross Entropy Churners Non−churners ● 0.2 1.  Comumunity Cross Entropy 1.2 y 0 2Ys 1 , x2Xs 1
  • 34. 33 Questions? @mrowebot m.rowe@lancaster.ac.uk http://www.lancaster.ac.uk/staff/rowem/ Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms. M Rowe. International Conference on Social Informatics. Kyoto, Japan (2013) Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction. Understanding: The Evolution of Social Web Users From Mining to M Rowe. International Conference on Data Mining. Dallas, US. (2013)