SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
MINING USER LIFECYCLES
FROM ONLINE COMMUNITY
PLATFORMS AND THEIR
APPLICATION TO CHURN
PREDICTION
DR. MATTHEW ROWE
SCHOOL OF COMPUTING AND COMMUNICATIONS
@MROWEBOT | M.ROWE@LANCASTER.AC.UK
International Conference on Data Mining 2013
Dallas, USA
Identity Development: Offline
1

Development
happens through
stages

Development =
conflicts

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
User Development: ‘Online’
2

¨ 

Recently studied in isolated dimensions:
¤  Socially

(Telecoms Networks: Miritello et al. 2013)

n  Communication

networks tend to a capacity

¤  Lexically

(Online Communities: Danescu-Niculescu-Mizil
et al. 2013)
n  Language

¨ 

adapts to the community, before diverging

Without analysing development:
a) 
b) 

Relative to earlier signals
Relative to the community of interaction

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
Understanding User Development
enables…
3

work (more later)

Jul

Sep

Nov

A

(b) T

0.8

Entropy
Period Entropy
Community Entropy
In−degree
Out−degree
Lexical
All

0.2

0.4

0.6

Figure 3: Average rat
moving average of the
categories.
0.0

of this talk

True Positive Rate

n  Focus

churners from development signals

1.0

Churn Prediction
¤  Forecast

Mar

Time

(a) Lens
2. 

7.0

8.0
May

6.0

Average Rating

3.8
3.6
3.4

Directorial Debut Films
1990s Comedy Films

5.0

n  Current/future

Average Rating

Stage-based user
neighbourhoods (e.g. user-kNN)
¤  Modelling taste evolution (e.g. biases in MF)

3.2

¤  Developmental

4.0

Recommender Systems

3.0

1. 

for MovieLens the scores re
Movie Tweetings ‘Independe
rating and ‘Directorial Debu
rating over time. Such info
the biases of the recommen
stability of a given bias in
made: i.e. considering the
and how this relates to pre

0.0

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

5.

0.2

0.4

0.6

0.8

1.0

ANALYSING TA
False Positive Rate

Analysing the evolution a
allows one to understand h
Outline
4

Datasets: Online Community Platforms
¨  Defining User Lifecycles and Properties
¨  Mining Lifecycle Trajectories
¨  Predicting Churners
¨  Findings and Conclusions
¨  Future Work
¨ 

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
examination of user lifecycles we used data collected from Facebook, the SAP
Community Network (SAP) and Server Fault. Table 1 provides summary statistics of the datasets where we only considered users who had posted more than 40
times within their lifetime on the platform.1 The Facebook dataset was collected
Datasets: Online Community Platforms
from groups discussing Open University courses, where users talked about their
5
issues with the courses and guidance on studying. The SAP Community Network
is a community question ‘Open University’related to SAP technologies where
1.  Facebook answering system Groups
users post questions and provide answers related to technical issues. Similarly,
¤  Containing discussions about courses and degrees
Server Fault is a platform that is part of the Stack Overflow question answering
2. 
site collection2SAP Community Network related to server-related issues. We
where users post questions
divided each platform’s users up into 80%/20% splits for training (and analysis)
¤  Question-answering system for SAP technologies
and testing, using the former in this section to examine user development and
3. 
the latter splitServer Fault detection experiments.
for our later
¤  Stack

Overflow subsidiary site for server-related issues

Table 1. Statistics of the online community platform datasets.
Platform
Time Span
Post Count User Count
Facebook
[18-08-2007,24-01-2013] 118,432
4,745
SAP
[15-12-2003,20-07-2011] 427,221
32,926
Server Fault [01-08-2008,31-03-2011] 234,790
33,285

3.1

Defining Lifecycle Periods

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

In order to examine how users develop over time we needed some means to
User Lifecycles: Derivation
6

Offline Lifecycle Periods
Primary School

High
School

University

Postgrad

Postdoc

Lecturing

Time
First Post
Last Post
Lifecycle Periods of a potential Question-Answering System user (conjecture!)
Novice Users

Asking Questions

Asking & Answering
Questions

Answering
Questions

In reality: do not know the labels, however we can split by equal time intervals:

1

2

3

…

n

Yet, users non-uniformly distribute their activity across lifecycles

1

2

3

…

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

n
User Lifecycles: Properties
7

We set n=20

1

2

1
#posts
¨ 

3
2

=

…

n

Divide lifetime into equal activity periods

#posts

Capture period-specific user properties (in period s):
¤ 

In-degree distribution
n 

¤ 

Out-degree distribution
n 

¤ 

Relative frequency distribution of senders to user u in period s
Relative frequency distribution of recipients from user u in s

Term distribution
n 

Relative frequency distribution of terms used by u in s

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

s
they develop in the community (for SAP and Facebook),
however Server Fault users remain relatively stable. This
could be due to the relatively minor interaction effects that
take place on ServerFault: users largely lurk on the platform
Analysing Development: Period not contribute
to seek answers to questions, and thus do Entropy
unless it is necessary (i.e. they feel that their expertise is
(3)
8
sufficient to answer a question or that a new question is
¨  required), asin users’itproperties across periods
Variation a result is likely that users have an implicit
understanding of how one should formulate a post and thus
¨  Computed period entropy for each property
ghout their
the language that should be used.
using three
tribution in
change in
Facebook
with earlier
SAP
Server Fault
ng relative
ion in one
es over the
0 0.2
0.5
0.8 1
0 0.2
0.5
0.8 1
0 0.2
0.5
0.8 1
G

G

G

G

GG
G

GGGG

GG

G

GG

GG

G

Lifecycle Stages

G
GG
GGG

GG

G
G
GGGGG GG

Lifecycle Stages

G

G

G

Distribution Entropy
2.5 3.0 3.5 4.0 4.5

G

Distribution Entropy
0.6
0.8

G

0.4

Distribution Entropy
0.1
0.3
0.5
0.7

: C[t,t ] →
sage by the
conditional
t, t ] as:

G

GGGGGGGGG GGGGG
GGGG
G

Lifecycle Stages

(a) In-degree
(b) Out-degree
(c) Lexical
tropy): To
hin a given
Generally stable trends: of lifetime-stage distributions formed from users’ terms
Figure 1. Entropies consistent variance in communication and
probability
in-degrees, out-degrees and lexical their Application to Churn Prediction
Mining
y describes User Lifecycles from Online Community Platforms andterms.
riable, and
that consistently across the platforms, users are contacted
by people who have contacted them before and that fewer
novel users appear. The same is also true for the out-degree
distributions: users contact fewer new people than they did
before. This is symptomatic of community platforms where
despite new users arriving within the platform, users form
sub-communities in which they interact and communicate
Changes in properties relative to earlier
with the same individuals. Figure 2(c) also demonstrates that
Computed the minimised over time and thus produce a
users tend to reuse language cross-entropy for each
gradually
propertydecaying cross-entropy curve.

users form
tently perfor
We find a
where diver
the latter st
demonstrate
SAP we fi
initially bef
while for Se
cross-entrop
suggesting t
Convergence on prior properties diverge f
to
This effect
[2] where u
begin with,

Cross Entropy
0.10
0.20

G

Facebook
SAP
Server Fault

1.2

G

G
G
G

G
G
G

0

G

GGGGGGGGGGGGGGG

0.2
0.5
0.8
Lifecycle Stages

1

0.00

0.00

GG

0

G

G

GG

GG

GGG
GGG
GG
G
GG

0.2
0.5
0.8
Lifecycle Stages

1

GGG
GGGGGG
GGGGGG

0.0

0.30

¨ 

Cross Entropy
0.4
0.8

¨ 

Cross Entropy
0.05
0.10

9

0.15

Analysing Development: Period CrossEntropy

0

0.2
0.5
0.8
Lifecycle Stages

1

V.

Inspecting
concentrated
Convergence: lack of communication with new people, or use of new terms
platform, ex
Figure 2. Cross-entropies derived from comparing users’ in-degree, outnamics of co
Mining User Lifecycles from Online Community Platforms and theirwith previous lifecycle periods. We
degree and lexical term distributions Application to Churn Prediction
now turn to
see a consistent reduction in the cross-entropies over time.
(a) In-degree

(b) Out-degree

(c) Lexical
Analysing Development: Community CrossEntropy
10

Difference in properties relative to the community
¨  Computed cross-entropy for each property between
user @ [t,t’] and community @ [t,t’]
¨ 

G

G

GGGG

GGG
GGGGGGG

0

2.0

G G GGGGGGG
GGGGGG G G

(a) In-degree

G

Cross Entropy
7.0
8.0

G

G

G

G

GG

6.0

G

Cross Entropy
3.0
4.0
5.0

Cross Entropy
1
2
3
4

lexical en
Facebook
entropy re
SAP
Server Fault
increase. W
here due t
users R2 >
0 0.2
0.5
0.8 1
0 0.2
0.5
0.8 1
0 0.2
0.5
0.8 1
Lifecycle
Lifecycle Stages
Convergence onLifecycle Stages properties Stages
community
Divergence from the community
B. Modell
(b) Out-degree

G

GG

G

GG

G

G

G

G

G

GG
G
GG

(c) Lexical

G

Inspecti
Convergence-divergence: first, adapt to community; second, separate
earlier, by
Figure 3. Cross-entropies derived from comparing users’ in-degree, outMining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
paring use
degree and lexical term distributions the community platform over the same
time periods. We see a increased divergence towards the end of lifecycles.
decreasing
How can we model the evolution of individual users?
Solution: Mine Lifecycle Trajectories
i.e. fit a curve for each user’s development measure (property and indicator)

Properties: in-degree, out-degree, terms
Indicators: period entropy, period cross-entropy, community cross-entropy
Measures: property and indicator (e.g. in-degree period entropy)

11
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
opment of user properties, setting the explanatory variable
to be the lifecycle period of the user and the response
variable to be the user property’s entropy. In modelling
entropy development we can characterise each user using the
slope (β) of the model, thus indicating the rate of change of
entropy throughout the lifecycle periods. We induced user¨ specific entropy models for each platform’s users and then
Fitted per-user linear regression models
examined the cumulative frequency distribution [0,1] β¤  Ind’ var: entropy. Dep’ var: lifecycle period of the
values for the different user properties and platforms, these
¤  >80% of users R2 > 0.4
are shown in Figure 4.

−4

0

2 4
β

6

(a) In-degree

8

0.0

F(x)
0.4

0.0

F(x)
0.4

0.0

F(x)
0.4

0.8

Facebook
SAP
Server Fault

0.8

12

0.8

Lifecycle Trajectories: Period Entropy

propertie
the avera
decay ov
users had
than 0, th
model. T
to be pro
x (e.g. i
λ = 1/¯.
x
model u
[t0 , t0.05 ]
model as
the perio
out-degre

−2 −1

0

1

2

β

(b) Out-degree

3

−3

−1 0
β

1

2

3

(c) Lexical

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

Figure 4. Cumulative frequency distributions of linear regression models’

As we
model fo
users alo
In Figure
user throughout their lifecycles, letting
By
se the in-degree, out- derivingbythe distribution cross-entropy when proportional f (ui , [t, t ])
earlier,
deriving the minimum of average commparing users’ represent changes in user
model to
munity platform over the same values user properties with past properties, platforms and user
change paring ( ) then
across the users converge indicated clear
different on their past
devel-the end ofdevelopment, beforetrends. Thatfunction that returns the period cross-entropy of an
towards
community lifecycles. decreasing
is, property (e.g. in-degree) for a given user
propertiesbehaviour over time. the proportion of users for
we examined user
riable users. We begin this section This suggests that an exponential decay whom
individual
interval:
model process. suitable for than 0, Cross-Entropy
would be
Lifecyclechange was greater describing such reductions
Trajectories: Period and thus indicating
yatforms differtheterms mining
trajectories in average
and the
ponse
throughout user’s lifecycles. Applying such a model requires f (ui , [t, t ]) − f (ui , [
1
e development of users overall. We found that = cross-entropy values over
decay
for all tested measures, all
elling 13
period
y Trajectoriestrajecto- that users reduce in their δui
|T | − 1
f (u , [t, t ])
ng the lifecycle
time. average proportional changethe case]∈T, greater i
To examine whether this was indeed ],[t ,t we
[t,t value
users
an
ngof the entropy haduser properties
the
s (in-degree, out-degree,of defined the converged on past behaviour of
t<t <t
¨  Earlier: users measure δu that returns the average propor(entropy,
nge of period-cross- thus suggesting the period cross-entropy for a given growth
exhibited athan 0, tional change value in suitability of a decaying
generally stable entropy
Mining is performed by
By deriving [t, t one
model. Thewe chose their terms, letting requires denote a
exponential
ycle periods. in ¤  I.e. previously seendecay model f (ui , the ])distribution of average pro
user-changes Thereforeuser throughoutthe lifecycles,relationships, etc. parameter
resent
user
period cross-entropy of
the
del as a suitableFirst, examined the the change decay exponential decay platforms
modelfunction that returns potential for rateacross given value
for the that defines the values (δ) an arbitrary different
develof a time
d then beforebe provided
evelopment, to then
¨ 
user property (e.g. in-degree) for a given user and
properties we examined the proportion
erties, begin this the explanatory variable
rs. We setting section in-degree period cross-entropy) over time, where of users f
x (e.g. interval:
he the model:
nd
the average change was greater than 0, and thus i
periodmining process. and the response
of the user
1
f (u [t, t ]) − (ui [t , t ])
= 1/¯. We defined the lifecyclei ,period ffor, the exponential tested mea
x δu =
these
decay overall. ,We ])
found that for all
ser property’s entropy. In modelling
|T | − 1
f (ui [t, t
using an integer ,tusers s an average . , 20}, hence
[t,t ],[t
]∈T,
we can user model each user using the t<t value had = {1, 2, . .proportional change value o
opy of characterise
properties
Average proportional
<t
Feature value for interval [t,t’]
[t value in
enerally change0 , t thefeature change of
stable entropy rate
than 0, Feature: property and development indicator
thus suggesting the(6)
suitability of
l, thus indicating 0.05 ] ⌘of s1 , and then defined the exponential decay a decayin
By deriving the
Therefore we chose the We induced lettingdistribution) beexponential decay returns
he lifecycle model as follows, a proportionalof average proportional model requires one p
periods. users had user- fmodel. The a function that
(s, ui change value <0,
¨  All
change values (δ) across the different platforms and user
le model for the develbe arbitrary feature (in-degree,
els explanatory platform’s properties wethen ofthe proportion ofλ thatfor whom the decay rate of a giv
for eachthe periodusers and examinedto an provided users defines
cross-entropy
the
variable
ative and the response the average change wasx (e.g.than 0, and thus indicating
frequency hence fitted exponential decay model:
distribution of the βgreater in-degree period cross-entropy) over tim
user
out-degree, terms) for a given 1/¯. We defined the lifecycle period for the ex
user and lifecycle period:
Average of user’s features
λ= x
i

i

As we induce a per-user parameter, and thus derive a
0.8

0.8

ntentropy.properties and platforms, these found that for all tested measures, all
user In modelling decay overall. We
Exponential Decay Model
erise
model using an s
integer value
4. each user using the users had an average proportional change value of greater s = {1, 2, . . . , 20
g(ui , s) the ,suitability≡ a , and growth
(u
than 0, thus suggesting= f t i ,]s1 )es decaying then defined(7) exponenti
ng the rate of change of
[t0 0.05 of 1
the
model. Community Platforms decay Application to Churn Prediction
riods. WeMining User Lifecycles from Online The exponential and theirmodel requires one parameter
induced user2 3
model decay rate of letting f (s,
to be provided λ that defines the as follows, a given value ui ) be a function tha
latform’s users and then
Lifecycle Trajectories: Community CrossEntropy
14

n  Divergence

linear regression
●

●

●

●

●

●●●●

0

●
●

●●●
●●●●●●●

●

● ● ●●●●●●●
●●●●●● ● ●

0.2
0.5
0.8
Lifecycle Stages
●●

●●

1
●

●

●

●

●●●●

●

6.0

0.2
0.5
0.8
Lifecycle Stages

●●●
●●●●●●●

●

●

Cross Entropy
7.0
8.0

Cross Entropy
3.0
4.0
5.0

0

6.0

0

1

2.0

●

● ●
● ● ●●●●●●●
● ●●●● ● ●

●●

2.0

0

Cross Entropy
1
2
3
4

●

0.2
0.5
0.8
Lifecycle Stages

●●

0

●
●

●

●●
●
●●

●

●
●

(b
lex
1
n  Facebook, SAP: quadratic regression
Facebook
en
SAP
Figure 3. Cross-entropies deri
n  Server Fault: linearIn-degree
(a)
(b) Out-degree
(c) Lexical inc
Server Fault regression
degree and lexical term distribut
he
time periods. We see a increase
>73% of users have R2 > 0.4
Figure 3. Cross-entropies derived from comparing users’ in-degree, use
out
0

(a) In-degree

2.0

Facebook
SAP
Server Fault

● ●
● ● ●●●●●●●
● ●●●● ● ●

¤  Lexical:

¨ 

●

Cross Entropy
3.0
4.0
5.0

¤  Out-degree:

Cross Entropy
1
2
3
4

n  Convergence-divergence

Facebook
SAP
Server Fault
Cross Entropy
7.0
8.0

quadratic regression

●

0

¤  In-degree:

Cross Entropy
1
2
3
4

Identified differences between platforms and
properties’ trajectory models
Cross Entropy
3.0
4.0
5.0

¨ 

1

0

●●

●

●●

0.2
0.5
0.8
Lifecycle Stages

●

●

●

●

●

●●
●
●●

●

●

0.2 degree and lexical term0.2
0.5
0.8 1
0 distributions the community platform over the sam
0.5
0.8 1
0 0.2
0.5
0.8 1
Lifecycle periods. We see a Lifecycle Stages Prediction
Lifecycle Stages
Mining User Lifecycles from Online CommunityStages
time Platforms and their Application to Churn
increased divergence towards the end of lifecycles
0

(a) In-degree

(b) Out-degree

B.
informs how online com
(c) Lexical
Mining lifecycle trajectories enables users to be
categorised by their behaviour…

Facilitating Churn Prediction

15
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
entr
F EATURES USED FOR THElabel of PREDICTION EXPERIMENTS .B. Experimental4 Setup
CHURN the user from one of two T HE
values: y 2 {0, 1},
the closed inter
In this section we INDICATORS OF LIFECYCLExiTRAJECTORIESbinary R-valued feature vector for mod
define churn prediction as an 11-element TO
while
denotes a ARE USED
formance eac
CHARACTERISE USER EVOLUTION ALONG or DIFFERENT USER 10-element feature of and
either examined indicaclassification task and use the previously a FacebookTHE SAP user, and a For our experiments
PROPERTIES .
sures: (i) fina
vector for a Server Fault user - given that we use a linear precis
by
tors of lifecycle trajectories to predict whether a for each user’s lexical combining the test
user is a
mod
regression model
community cross- charac
operator
Property
Model Feature(s)
Platform
setting feat
ea
churner or not. As we confine Indicator
user lifecycle periodsWe model thetogether and ranked the u
entropy development. from
feature vector of each
we
In-degree
Period Entropy
Linear Regression
All
sele
user using thetrajectory indicators Alland a standard deviatio
the trajectories
from the previous section,
Period Cross-Ent
the start of their lifecycle to the end we useExponential Decay 2
16
the induced perf
mo
in short Quad’ Regress’ a1 ,our set ofAllagain where we place
Table II defines a
features into the respectiv
Comm’ Cross-Ent
mined from this period to characterise howLinearset depending on the dynamics it captures.ranks, set
users develop. All
top-k
A
Out-degree
Period Entropy within a Regression
each
¨ 
be
Period Cross-Ent
Exponential Decay
the mean asthe
We define churners as any user who posts for the last time Allthe same instancesof dict
Comm’ Cross-Ent
Linear Regression Table II All
dom
observingto .which the u
different user
F
Period window of our datasets,PREDICTION EXPERIMENTS T HE
Linear FOR THE CHURN
All
before the ¨ 
final 10%Lexicalthe time EntropyEATURES USEDRegression
of
the
Period Cross-Ent INDICATORS OF LIFECYCLE TRAJECTORIES AREchurn prediction
Exponential Decay
Allics on USED TO
correct.
use
cutoff points are: 2012-07-09 Comm’Facebook,Quad’ Regress’ EVOLUTION Fb, SAPTHE DIFFERENT USER We form
for Cross-Ent
2010-05-11 2
CHARACTERISE USER a1 , afor ALONG
¨ 
a randomly sele
sure
Comm’ Cross-Ent
Linear RegressionPROPERTIES .
SAP, and 2010-12-23 for ServerFault. Our dataset is of the SFerty in isolation, for in
oper
to the probabil
Property
Indicator
Modeland the entropy, period
Feature(s)
Platform
following form: D = {(xi , yi )}, where yi denotes the class Linear Regression
In-degree
Period Entropy
All
(setting p =we
|ch
Period Cross-Ent 4 Exponential Decay
All
entropy trajectory indi
the
label of the user from one of two values: y Comm’ {0, 1}, Quad’ Regress’ a1 , a2 the receiver op
2 Cross-Ent
All
A. Prediction Model Definition
Out-degree
Period Entropy
Linear Regression
All
model in confidence of a
isolation, topfor
while xi denotes an 11-element R-valued feature vector for Exponential Decay
Period Cross-Ent
All
the
Facebook, SAP: 11 features
Comm’
All
and examining in-degre
observed
userfeature ) contains
ui
to w
settings of confi
either a Facebook The SAP user,featurea vector of Period Cross-Ent (xiLinear Regression
or features and 10-element Entropy Linear Regression
Lexical
All
Server Fault: 10
Period along
corr
the indicator trajectories of we use a linear Exponential Decay
finally thereby setting
combined a
vector for a Server Fault user - given that the user Cross-Ent the different, a2 we All
Comm’ Cross-Ent
Quad’ Regress’ a1
Fb, SAP
properties. We use the logistic regression modelLinear predict In SF
to Regression
Comm’ Cross-Ent
model. follows: soa ran
doing to w
regression model for each user’s lexical community crosst
the conditional probability of user ui churning as follows:
features maximum
p
(sett
¨ 
Induce
entropy development. We model the feature vector of each coefficients via on prediction =
f (x)
the
1 Definition
likelihood estimation
selection for specific
A. Prediction Model
user using the trajectory indicators from |the)previous section,
P r(Y = 1 xi =
(9)
|x
confi
i
Probability of user churning
1+e
model dif
The where we place
user ui (xi ) For each setti
in short Table II defines our set of featuresobserved feature vector of performingcontains for F
the indicator trajectoriesweight user As the used
along
different
Mining User Lifecycles TheOnline Community Platforms and their)Application to Churn Predictionattached we positivethe therP
from model’s coefficients (
define the of the
log
(T
each within a set depending on the dynamics We captures. regression model to predictrate follo
it use the logistic
properties.
to each identity trajectory feature within the linear model

Predicting Churners

Binary classification task: is user u a churner?
Dataset churners: who last posted before final 10%
Dataset attributes from trajectory model features

Induced Logistic regression model:

and from are
diction model we these
Evaluation: Setup
17

User-wise dataset split: 80% training, 20% testing
¨  Experiments:
¨ 

¤  Isolated

user properties, isolated development indicator
features, all features together

¨ 

Evaluation measures:
1. 
2. 

¨ 

Precision@k (P): Avg over k={1,5,10,20,50,100}
Area Under the Receiver Operator Curve (AUC)

Baseline: Success probability in single Bernoulli trial
¤  I.e.

randomly selecting a churner

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
Table III
¯ ) AND A REA U NDER THE RECEIVER OPERATOR
P RECISION @ K (P
CHARACTERISTIC C URVE (AU C) VALUES FOR FACEBOOK , SAP AND
S ERVER FAULT WHEN TESTING DIFFERENT: ( I ) USER PROPERTIES , ( II )
DEVELOPMENT INDICATORS , ( III ) ALL FEATURES TOGETHER .

Evaluation: Results

AU C) is preferable (thus achieving a value
baseline for this measure is 0.5.
18

nts the performance of the different models
¨  Variance in features
atforms, showing variation in the optimum
ation measures. Interestingly, we find that
depending on:
ures combined together does not yield the
¤  Accuracy preference
y of the tested platforms. For Facebook the
hat the prediction model using community
n  I.e. precision ¯ recall
>
icators performed best in terms of both P
sted the difference between this model and
¤  Platform
ming model (Full) using a Mann-Whitney
n  Different detection
he difference to be significant (at the 5%

signals for different
found differences in the communities
best performing

Platform
Facebook

SAP

Server Fault

Feature
Entropy
Period Cross Entropy
Community Cross Entropy
In-degree
Out-degree
Lexical
Full
Baseline
Entropy
Period Cross Entropy
Community Cross Entropy
In-degree
Out-degree
Lexical
Full
Baseline
Entropy
Period Cross Entropy
Community Cross Entropy
In-degree
Out-degree
Lexical
Full
Baseline

to the evaluation measure used: in-degree
¯
exical features ¨  FullThese differences
for P . model is never
entrating on top ranks and thus informing
the best
ners with high-levels of confidence can be
assessing the term distributions of users
dynamics, while for preferring recall the
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
distributions is preferable.
o Server Fault, the results also indicate

¯
P
0.761
0.624
0.791
0.648
0.781
0.681
0.730
0.629
0.434
0.321
0.334
0.351
0.250
0.438
0.363
0.342
0.392
0.300
0.352
0.232
0.293
0.459
0.421
0.319

AU C
0.500
0.485
0.617
0.511
0.570
0.557
0.573
0.500
0.549
0.568
0.549
0.592
0.503
0.539
0.539
0.500
0.526
0.555
0.538
0.475
0.512
0.546
0.554
0.500
s are salient
gh precision

their in-degree distributions, and the extent to which they
are contacted during one time period relative to their past
communications, reduces at a much faster rate than on
ServerFault.

Evaluation: Churner Patterns
Table IV

19
B EST PERFORMING PREDICTION MODEL COEFFICIENTS FOR FACEBOOK
icting churn( COMMUNITY CROSS - ENTROPY ), SAP (I N - DEGREE ) AND S ERVER
nspecting the
Reduced quadratic coefficients: churnersLL FEATURES ARE SIGNIFICANT
FAULT ( PERIOD CROSS - ENTROPY ). A exhibit steep
. One of the
WITHIN THEIR RESPECTIVE MODELS (↵ < 0.05)
cross-community curves towards the end of their lifecycles
as our churn
Feature
Facebook
SAP
Server Fault
s that can be
In-degree Entropy
0.0532
dual features
In-degree Period Cross-Ent
0.0139
-0.1826
1
In-degree Comm’ Cross-Ent a
-0.1057
-0.1878
y inspecting
2
In-degree Comm’ Cross-Ent a
-0.0510
-1.5104
odel we can
Out-degree Comm’ Cross-Ent
0.3173
Out-degree Period Cross-Ent
0.0210
ase/decrease)

Lexical Period Cross-Ent
Lexical Comm’ Cross-Ent a1
Lexical Comm’ Cross-Ent a2

0.3253
-0.0541

-

0.0557
-

nts from the
g the AU C,
Variance in decay coefficient: degree of communication decays
and SAP we
VII. D ISCUSSION a lot faster forW ORK
AND F UTURE SAP than Server Fault
n model for
distributions
Prior work on social network evolution by Panzarasa et al.
Mining
[6] from Miritello et al. [1] their Application to Churn social
has a vertexUser Lifecyclesand Online Community Platforms andfound that users’Prediction networks
sed and that
tend to a limit in terms of their communication capacity.
Conclusions
20

1. 

Users communicate with a fixed-set of users
¤  Similar

2. 

to findings from (Miritello et al. 2013)

Convergence-divergence effect: users converge on
community ‘norms’ before diverging
¤  (Erikson.

1959) theorised that younger people are
susceptible to social norms
¤  (Danescu-Niculescu-Mizil et al. 2013) found users to
converge on lexical norms, before diverging
3. 

Variance in churner signals
¤  No

common best model was found across platforms

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
Current & Future Work
21

1. 

Regularised Linear Models

¤  Achieved ~30% AUC boost with growth and magnitude
that users tend to converge in their reviewing behaviour and
features
u,s,c
that previous profiles allow one to gauge how the user will
Dtrain )
(4)
rate items in the future given their category information.
u,s,c0
ng(Dtrain )
2. 
Conversely, for MovieLens and Movie Tweetings we see an
opposite e↵ect: users’ taste profiles become less predictable
¤  Used lifecycle model (n=5) to form category-ratings profiles
as they develop; users rate items in a way that renders unassess the relative
certainty variance from previous information.
n user and lifecycle ¤  Identified in profiling in taste evolution across platforms
mapping function
categories they are
Dissimilarity
categories ( g ) we
in taste profile
o di↵erent categorfrom previous
the former profile
profile
gories, would lead
ficity that the cat1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Lifecycle Stages
Lifecycle Stages
Lifecycle Stages
type, formed from
ries would lead to
(a) Lens
(b) Tweetings
(c) Amazon
uenced byMiningprior
the User Lifecycles from Online Community Platforms and their Application to Churn Prediction
thors consider only
0.220

0.290
0.275

●

●

0.215

●

●

0.210

●

●

Conditional Entropy

0.285

●

0.205

●

0.280

●

Conditional Entropy

0.235

0.245

●

●

0.225

Conditional Entropy

Evolving-Taste Recommender System

●
22

Questions?
@mrowebot
m.rowe@lancaster.ac.uk
http://www.lancaster.ac.uk/staff/rowem/

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

Contenu connexe

Similaire à Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

Testing Vitality Ranking and Prediction in Social Networking Services With Dy...
Testing Vitality Ranking and Prediction in Social Networking Services With Dy...Testing Vitality Ranking and Prediction in Social Networking Services With Dy...
Testing Vitality Ranking and Prediction in Social Networking Services With Dy...reshma reshu
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEMEMTECHSTUDENTSPROJECTS
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersMatthew Rowe
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016Saurabh Deochake
 
Studying user footprints in different online social networks
Studying user footprints in different online social networksStudying user footprints in different online social networks
Studying user footprints in different online social networksIIIT Hyderabad
 
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...Markus Luczak-Rösch
 
Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsInferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsNicolas Kourtellis
 
What network simulator questions do users ask? a large-scale study of stack o...
What network simulator questions do users ask? a large-scale study of stack o...What network simulator questions do users ask? a large-scale study of stack o...
What network simulator questions do users ask? a large-scale study of stack o...nooriasukmaningtyas
 
IRJET- College Enquiry Chatbot System(DMCE)
IRJET-  	  College Enquiry Chatbot System(DMCE)IRJET-  	  College Enquiry Chatbot System(DMCE)
IRJET- College Enquiry Chatbot System(DMCE)IRJET Journal
 
Network Effects
Network EffectsNetwork Effects
Network Effectsa16z
 
AINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoAINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoLidia Pivovarova
 
Anticipated Versus Actual Effects Of Platform Design Change A Case Study Of ...
Anticipated Versus Actual Effects Of Platform Design Change  A Case Study Of ...Anticipated Versus Actual Effects Of Platform Design Change  A Case Study Of ...
Anticipated Versus Actual Effects Of Platform Design Change A Case Study Of ...Sean Flores
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
 
Using transfer learning for video popularity prediction
Using transfer learning for video popularity predictionUsing transfer learning for video popularity prediction
Using transfer learning for video popularity predictioneSAT Publishing House
 
Is software engineering research addressing software engineering problems?
Is software engineering research addressing software engineering problems?Is software engineering research addressing software engineering problems?
Is software engineering research addressing software engineering problems?Gail Murphy
 
IRJET- Event Detection and Text Summary by Disaster Warning
IRJET- Event Detection and Text Summary by Disaster WarningIRJET- Event Detection and Text Summary by Disaster Warning
IRJET- Event Detection and Text Summary by Disaster WarningIRJET Journal
 
Multi-User Audio Composition Application
Multi-User Audio Composition ApplicationMulti-User Audio Composition Application
Multi-User Audio Composition ApplicationIRJET Journal
 

Similaire à Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction (20)

Testing Vitality Ranking and Prediction in Social Networking Services With Dy...
Testing Vitality Ranking and Prediction in Social Networking Services With Dy...Testing Vitality Ranking and Prediction in Social Networking Services With Dy...
Testing Vitality Ranking and Prediction in Social Networking Services With Dy...
 
Final Report
Final ReportFinal Report
Final Report
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web Users
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016
 
Studying user footprints in different online social networks
Studying user footprints in different online social networksStudying user footprints in different online social networks
Studying user footprints in different online social networks
 
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
 
Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsInferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
 
What network simulator questions do users ask? a large-scale study of stack o...
What network simulator questions do users ask? a large-scale study of stack o...What network simulator questions do users ask? a large-scale study of stack o...
What network simulator questions do users ask? a large-scale study of stack o...
 
IRJET- College Enquiry Chatbot System(DMCE)
IRJET-  	  College Enquiry Chatbot System(DMCE)IRJET-  	  College Enquiry Chatbot System(DMCE)
IRJET- College Enquiry Chatbot System(DMCE)
 
Network Effects
Network EffectsNetwork Effects
Network Effects
 
AINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoAINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, Nikolenko
 
Anticipated Versus Actual Effects Of Platform Design Change A Case Study Of ...
Anticipated Versus Actual Effects Of Platform Design Change  A Case Study Of ...Anticipated Versus Actual Effects Of Platform Design Change  A Case Study Of ...
Anticipated Versus Actual Effects Of Platform Design Change A Case Study Of ...
 
Sampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social NetworkSampling of User Behavior Using Online Social Network
Sampling of User Behavior Using Online Social Network
 
Using transfer learning for video popularity prediction
Using transfer learning for video popularity predictionUsing transfer learning for video popularity prediction
Using transfer learning for video popularity prediction
 
Is software engineering research addressing software engineering problems?
Is software engineering research addressing software engineering problems?Is software engineering research addressing software engineering problems?
Is software engineering research addressing software engineering problems?
 
IRJET- Event Detection and Text Summary by Disaster Warning
IRJET- Event Detection and Text Summary by Disaster WarningIRJET- Event Detection and Text Summary by Disaster Warning
IRJET- Event Detection and Text Summary by Disaster Warning
 
Multi-User Audio Composition Application
Multi-User Audio Composition ApplicationMulti-User Audio Composition Application
Multi-User Audio Composition Application
 

Plus de Matthew Rowe

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache SparkMatthew Rowe
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesMatthew Rowe
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Matthew Rowe
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings Matthew Rowe
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesMatthew Rowe
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...Matthew Rowe
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureMatthew Rowe
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMatthew Rowe
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Matthew Rowe
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web SystemsMatthew Rowe
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsMatthew Rowe
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research AgendaMatthew Rowe
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social SemanticsMatthew Rowe
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesMatthew Rowe
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsMatthew Rowe
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsMatthew Rowe
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeMatthew Rowe
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebMatthew Rowe
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataMatthew Rowe
 

Plus de Matthew Rowe (20)

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian Sequences
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online Communities
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, Future
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online Communities
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web Systems
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositions
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online Communities
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on Youtube
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social Data
 

Dernier

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Dernier (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

  • 1. MINING USER LIFECYCLES FROM ONLINE COMMUNITY PLATFORMS AND THEIR APPLICATION TO CHURN PREDICTION DR. MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | M.ROWE@LANCASTER.AC.UK International Conference on Data Mining 2013 Dallas, USA
  • 2. Identity Development: Offline 1 Development happens through stages Development = conflicts Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 3. User Development: ‘Online’ 2 ¨  Recently studied in isolated dimensions: ¤  Socially (Telecoms Networks: Miritello et al. 2013) n  Communication networks tend to a capacity ¤  Lexically (Online Communities: Danescu-Niculescu-Mizil et al. 2013) n  Language ¨  adapts to the community, before diverging Without analysing development: a)  b)  Relative to earlier signals Relative to the community of interaction Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 4. Understanding User Development enables… 3 work (more later) Jul Sep Nov A (b) T 0.8 Entropy Period Entropy Community Entropy In−degree Out−degree Lexical All 0.2 0.4 0.6 Figure 3: Average rat moving average of the categories. 0.0 of this talk True Positive Rate n  Focus churners from development signals 1.0 Churn Prediction ¤  Forecast Mar Time (a) Lens 2.  7.0 8.0 May 6.0 Average Rating 3.8 3.6 3.4 Directorial Debut Films 1990s Comedy Films 5.0 n  Current/future Average Rating Stage-based user neighbourhoods (e.g. user-kNN) ¤  Modelling taste evolution (e.g. biases in MF) 3.2 ¤  Developmental 4.0 Recommender Systems 3.0 1.  for MovieLens the scores re Movie Tweetings ‘Independe rating and ‘Directorial Debu rating over time. Such info the biases of the recommen stability of a given bias in made: i.e. considering the and how this relates to pre 0.0 Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction 5. 0.2 0.4 0.6 0.8 1.0 ANALYSING TA False Positive Rate Analysing the evolution a allows one to understand h
  • 5. Outline 4 Datasets: Online Community Platforms ¨  Defining User Lifecycles and Properties ¨  Mining Lifecycle Trajectories ¨  Predicting Churners ¨  Findings and Conclusions ¨  Future Work ¨  Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 6. examination of user lifecycles we used data collected from Facebook, the SAP Community Network (SAP) and Server Fault. Table 1 provides summary statistics of the datasets where we only considered users who had posted more than 40 times within their lifetime on the platform.1 The Facebook dataset was collected Datasets: Online Community Platforms from groups discussing Open University courses, where users talked about their 5 issues with the courses and guidance on studying. The SAP Community Network is a community question ‘Open University’related to SAP technologies where 1.  Facebook answering system Groups users post questions and provide answers related to technical issues. Similarly, ¤  Containing discussions about courses and degrees Server Fault is a platform that is part of the Stack Overflow question answering 2.  site collection2SAP Community Network related to server-related issues. We where users post questions divided each platform’s users up into 80%/20% splits for training (and analysis) ¤  Question-answering system for SAP technologies and testing, using the former in this section to examine user development and 3.  the latter splitServer Fault detection experiments. for our later ¤  Stack Overflow subsidiary site for server-related issues Table 1. Statistics of the online community platform datasets. Platform Time Span Post Count User Count Facebook [18-08-2007,24-01-2013] 118,432 4,745 SAP [15-12-2003,20-07-2011] 427,221 32,926 Server Fault [01-08-2008,31-03-2011] 234,790 33,285 3.1 Defining Lifecycle Periods Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction In order to examine how users develop over time we needed some means to
  • 7. User Lifecycles: Derivation 6 Offline Lifecycle Periods Primary School High School University Postgrad Postdoc Lecturing Time First Post Last Post Lifecycle Periods of a potential Question-Answering System user (conjecture!) Novice Users Asking Questions Asking & Answering Questions Answering Questions In reality: do not know the labels, however we can split by equal time intervals: 1 2 3 … n Yet, users non-uniformly distribute their activity across lifecycles 1 2 3 … Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction n
  • 8. User Lifecycles: Properties 7 We set n=20 1 2 1 #posts ¨  3 2 = … n Divide lifetime into equal activity periods #posts Capture period-specific user properties (in period s): ¤  In-degree distribution n  ¤  Out-degree distribution n  ¤  Relative frequency distribution of senders to user u in period s Relative frequency distribution of recipients from user u in s Term distribution n  Relative frequency distribution of terms used by u in s Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction s
  • 9. they develop in the community (for SAP and Facebook), however Server Fault users remain relatively stable. This could be due to the relatively minor interaction effects that take place on ServerFault: users largely lurk on the platform Analysing Development: Period not contribute to seek answers to questions, and thus do Entropy unless it is necessary (i.e. they feel that their expertise is (3) 8 sufficient to answer a question or that a new question is ¨  required), asin users’itproperties across periods Variation a result is likely that users have an implicit understanding of how one should formulate a post and thus ¨  Computed period entropy for each property ghout their the language that should be used. using three tribution in change in Facebook with earlier SAP Server Fault ng relative ion in one es over the 0 0.2 0.5 0.8 1 0 0.2 0.5 0.8 1 0 0.2 0.5 0.8 1 G G G G GG G GGGG GG G GG GG G Lifecycle Stages G GG GGG GG G G GGGGG GG Lifecycle Stages G G G Distribution Entropy 2.5 3.0 3.5 4.0 4.5 G Distribution Entropy 0.6 0.8 G 0.4 Distribution Entropy 0.1 0.3 0.5 0.7 : C[t,t ] → sage by the conditional t, t ] as: G GGGGGGGGG GGGGG GGGG G Lifecycle Stages (a) In-degree (b) Out-degree (c) Lexical tropy): To hin a given Generally stable trends: of lifetime-stage distributions formed from users’ terms Figure 1. Entropies consistent variance in communication and probability in-degrees, out-degrees and lexical their Application to Churn Prediction Mining y describes User Lifecycles from Online Community Platforms andterms. riable, and
  • 10. that consistently across the platforms, users are contacted by people who have contacted them before and that fewer novel users appear. The same is also true for the out-degree distributions: users contact fewer new people than they did before. This is symptomatic of community platforms where despite new users arriving within the platform, users form sub-communities in which they interact and communicate Changes in properties relative to earlier with the same individuals. Figure 2(c) also demonstrates that Computed the minimised over time and thus produce a users tend to reuse language cross-entropy for each gradually propertydecaying cross-entropy curve. users form tently perfor We find a where diver the latter st demonstrate SAP we fi initially bef while for Se cross-entrop suggesting t Convergence on prior properties diverge f to This effect [2] where u begin with, Cross Entropy 0.10 0.20 G Facebook SAP Server Fault 1.2 G G G G G G G 0 G GGGGGGGGGGGGGGG 0.2 0.5 0.8 Lifecycle Stages 1 0.00 0.00 GG 0 G G GG GG GGG GGG GG G GG 0.2 0.5 0.8 Lifecycle Stages 1 GGG GGGGGG GGGGGG 0.0 0.30 ¨  Cross Entropy 0.4 0.8 ¨  Cross Entropy 0.05 0.10 9 0.15 Analysing Development: Period CrossEntropy 0 0.2 0.5 0.8 Lifecycle Stages 1 V. Inspecting concentrated Convergence: lack of communication with new people, or use of new terms platform, ex Figure 2. Cross-entropies derived from comparing users’ in-degree, outnamics of co Mining User Lifecycles from Online Community Platforms and theirwith previous lifecycle periods. We degree and lexical term distributions Application to Churn Prediction now turn to see a consistent reduction in the cross-entropies over time. (a) In-degree (b) Out-degree (c) Lexical
  • 11. Analysing Development: Community CrossEntropy 10 Difference in properties relative to the community ¨  Computed cross-entropy for each property between user @ [t,t’] and community @ [t,t’] ¨  G G GGGG GGG GGGGGGG 0 2.0 G G GGGGGGG GGGGGG G G (a) In-degree G Cross Entropy 7.0 8.0 G G G G GG 6.0 G Cross Entropy 3.0 4.0 5.0 Cross Entropy 1 2 3 4 lexical en Facebook entropy re SAP Server Fault increase. W here due t users R2 > 0 0.2 0.5 0.8 1 0 0.2 0.5 0.8 1 0 0.2 0.5 0.8 1 Lifecycle Lifecycle Stages Convergence onLifecycle Stages properties Stages community Divergence from the community B. Modell (b) Out-degree G GG G GG G G G G G GG G GG (c) Lexical G Inspecti Convergence-divergence: first, adapt to community; second, separate earlier, by Figure 3. Cross-entropies derived from comparing users’ in-degree, outMining User Lifecycles from Online Community Platforms and their Application to Churn Prediction paring use degree and lexical term distributions the community platform over the same time periods. We see a increased divergence towards the end of lifecycles. decreasing
  • 12. How can we model the evolution of individual users? Solution: Mine Lifecycle Trajectories i.e. fit a curve for each user’s development measure (property and indicator) Properties: in-degree, out-degree, terms Indicators: period entropy, period cross-entropy, community cross-entropy Measures: property and indicator (e.g. in-degree period entropy) 11 Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 13. opment of user properties, setting the explanatory variable to be the lifecycle period of the user and the response variable to be the user property’s entropy. In modelling entropy development we can characterise each user using the slope (β) of the model, thus indicating the rate of change of entropy throughout the lifecycle periods. We induced user¨ specific entropy models for each platform’s users and then Fitted per-user linear regression models examined the cumulative frequency distribution [0,1] β¤  Ind’ var: entropy. Dep’ var: lifecycle period of the values for the different user properties and platforms, these ¤  >80% of users R2 > 0.4 are shown in Figure 4. −4 0 2 4 β 6 (a) In-degree 8 0.0 F(x) 0.4 0.0 F(x) 0.4 0.0 F(x) 0.4 0.8 Facebook SAP Server Fault 0.8 12 0.8 Lifecycle Trajectories: Period Entropy propertie the avera decay ov users had than 0, th model. T to be pro x (e.g. i λ = 1/¯. x model u [t0 , t0.05 ] model as the perio out-degre −2 −1 0 1 2 β (b) Out-degree 3 −3 −1 0 β 1 2 3 (c) Lexical Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction Figure 4. Cumulative frequency distributions of linear regression models’ As we model fo users alo In Figure
  • 14. user throughout their lifecycles, letting By se the in-degree, out- derivingbythe distribution cross-entropy when proportional f (ui , [t, t ]) earlier, deriving the minimum of average commparing users’ represent changes in user model to munity platform over the same values user properties with past properties, platforms and user change paring ( ) then across the users converge indicated clear different on their past devel-the end ofdevelopment, beforetrends. Thatfunction that returns the period cross-entropy of an towards community lifecycles. decreasing is, property (e.g. in-degree) for a given user propertiesbehaviour over time. the proportion of users for we examined user riable users. We begin this section This suggests that an exponential decay whom individual interval: model process. suitable for than 0, Cross-Entropy would be Lifecyclechange was greater describing such reductions Trajectories: Period and thus indicating yatforms differtheterms mining trajectories in average and the ponse throughout user’s lifecycles. Applying such a model requires f (ui , [t, t ]) − f (ui , [ 1 e development of users overall. We found that = cross-entropy values over decay for all tested measures, all elling 13 period y Trajectoriestrajecto- that users reduce in their δui |T | − 1 f (u , [t, t ]) ng the lifecycle time. average proportional changethe case]∈T, greater i To examine whether this was indeed ],[t ,t we [t,t value users an ngof the entropy haduser properties the s (in-degree, out-degree,of defined the converged on past behaviour of t<t <t ¨  Earlier: users measure δu that returns the average propor(entropy, nge of period-cross- thus suggesting the period cross-entropy for a given growth exhibited athan 0, tional change value in suitability of a decaying generally stable entropy Mining is performed by By deriving [t, t one model. Thewe chose their terms, letting requires denote a exponential ycle periods. in ¤  I.e. previously seendecay model f (ui , the ])distribution of average pro user-changes Thereforeuser throughoutthe lifecycles,relationships, etc. parameter resent user period cross-entropy of the del as a suitableFirst, examined the the change decay exponential decay platforms modelfunction that returns potential for rateacross given value for the that defines the values (δ) an arbitrary different develof a time d then beforebe provided evelopment, to then ¨  user property (e.g. in-degree) for a given user and properties we examined the proportion erties, begin this the explanatory variable rs. We setting section in-degree period cross-entropy) over time, where of users f x (e.g. interval: he the model: nd the average change was greater than 0, and thus i periodmining process. and the response of the user 1 f (u [t, t ]) − (ui [t , t ]) = 1/¯. We defined the lifecyclei ,period ffor, the exponential tested mea x δu = these decay overall. ,We ]) found that for all ser property’s entropy. In modelling |T | − 1 f (ui [t, t using an integer ,tusers s an average . , 20}, hence [t,t ],[t ]∈T, we can user model each user using the t<t value had = {1, 2, . .proportional change value o opy of characterise properties Average proportional <t Feature value for interval [t,t’] [t value in enerally change0 , t thefeature change of stable entropy rate than 0, Feature: property and development indicator thus suggesting the(6) suitability of l, thus indicating 0.05 ] ⌘of s1 , and then defined the exponential decay a decayin By deriving the Therefore we chose the We induced lettingdistribution) beexponential decay returns he lifecycle model as follows, a proportionalof average proportional model requires one p periods. users had user- fmodel. The a function that (s, ui change value <0, ¨  All change values (δ) across the different platforms and user le model for the develbe arbitrary feature (in-degree, els explanatory platform’s properties wethen ofthe proportion ofλ thatfor whom the decay rate of a giv for eachthe periodusers and examinedto an provided users defines cross-entropy the variable ative and the response the average change wasx (e.g.than 0, and thus indicating frequency hence fitted exponential decay model: distribution of the βgreater in-degree period cross-entropy) over tim user out-degree, terms) for a given 1/¯. We defined the lifecycle period for the ex user and lifecycle period: Average of user’s features λ= x i i As we induce a per-user parameter, and thus derive a 0.8 0.8 ntentropy.properties and platforms, these found that for all tested measures, all user In modelling decay overall. We Exponential Decay Model erise model using an s integer value 4. each user using the users had an average proportional change value of greater s = {1, 2, . . . , 20 g(ui , s) the ,suitability≡ a , and growth (u than 0, thus suggesting= f t i ,]s1 )es decaying then defined(7) exponenti ng the rate of change of [t0 0.05 of 1 the model. Community Platforms decay Application to Churn Prediction riods. WeMining User Lifecycles from Online The exponential and theirmodel requires one parameter induced user2 3 model decay rate of letting f (s, to be provided λ that defines the as follows, a given value ui ) be a function tha latform’s users and then
  • 15. Lifecycle Trajectories: Community CrossEntropy 14 n  Divergence linear regression ● ● ● ● ● ●●●● 0 ● ● ●●● ●●●●●●● ● ● ● ●●●●●●● ●●●●●● ● ● 0.2 0.5 0.8 Lifecycle Stages ●● ●● 1 ● ● ● ● ●●●● ● 6.0 0.2 0.5 0.8 Lifecycle Stages ●●● ●●●●●●● ● ● Cross Entropy 7.0 8.0 Cross Entropy 3.0 4.0 5.0 0 6.0 0 1 2.0 ● ● ● ● ● ●●●●●●● ● ●●●● ● ● ●● 2.0 0 Cross Entropy 1 2 3 4 ● 0.2 0.5 0.8 Lifecycle Stages ●● 0 ● ● ● ●● ● ●● ● ● ● (b lex 1 n  Facebook, SAP: quadratic regression Facebook en SAP Figure 3. Cross-entropies deri n  Server Fault: linearIn-degree (a) (b) Out-degree (c) Lexical inc Server Fault regression degree and lexical term distribut he time periods. We see a increase >73% of users have R2 > 0.4 Figure 3. Cross-entropies derived from comparing users’ in-degree, use out 0 (a) In-degree 2.0 Facebook SAP Server Fault ● ● ● ● ●●●●●●● ● ●●●● ● ● ¤  Lexical: ¨  ● Cross Entropy 3.0 4.0 5.0 ¤  Out-degree: Cross Entropy 1 2 3 4 n  Convergence-divergence Facebook SAP Server Fault Cross Entropy 7.0 8.0 quadratic regression ● 0 ¤  In-degree: Cross Entropy 1 2 3 4 Identified differences between platforms and properties’ trajectory models Cross Entropy 3.0 4.0 5.0 ¨  1 0 ●● ● ●● 0.2 0.5 0.8 Lifecycle Stages ● ● ● ● ● ●● ● ●● ● ● 0.2 degree and lexical term0.2 0.5 0.8 1 0 distributions the community platform over the sam 0.5 0.8 1 0 0.2 0.5 0.8 1 Lifecycle periods. We see a Lifecycle Stages Prediction Lifecycle Stages Mining User Lifecycles from Online CommunityStages time Platforms and their Application to Churn increased divergence towards the end of lifecycles 0 (a) In-degree (b) Out-degree B. informs how online com (c) Lexical
  • 16. Mining lifecycle trajectories enables users to be categorised by their behaviour… Facilitating Churn Prediction 15 Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 17. entr F EATURES USED FOR THElabel of PREDICTION EXPERIMENTS .B. Experimental4 Setup CHURN the user from one of two T HE values: y 2 {0, 1}, the closed inter In this section we INDICATORS OF LIFECYCLExiTRAJECTORIESbinary R-valued feature vector for mod define churn prediction as an 11-element TO while denotes a ARE USED formance eac CHARACTERISE USER EVOLUTION ALONG or DIFFERENT USER 10-element feature of and either examined indicaclassification task and use the previously a FacebookTHE SAP user, and a For our experiments PROPERTIES . sures: (i) fina vector for a Server Fault user - given that we use a linear precis by tors of lifecycle trajectories to predict whether a for each user’s lexical combining the test user is a mod regression model community cross- charac operator Property Model Feature(s) Platform setting feat ea churner or not. As we confine Indicator user lifecycle periodsWe model thetogether and ranked the u entropy development. from feature vector of each we In-degree Period Entropy Linear Regression All sele user using thetrajectory indicators Alland a standard deviatio the trajectories from the previous section, Period Cross-Ent the start of their lifecycle to the end we useExponential Decay 2 16 the induced perf mo in short Quad’ Regress’ a1 ,our set ofAllagain where we place Table II defines a features into the respectiv Comm’ Cross-Ent mined from this period to characterise howLinearset depending on the dynamics it captures.ranks, set users develop. All top-k A Out-degree Period Entropy within a Regression each ¨  be Period Cross-Ent Exponential Decay the mean asthe We define churners as any user who posts for the last time Allthe same instancesof dict Comm’ Cross-Ent Linear Regression Table II All dom observingto .which the u different user F Period window of our datasets,PREDICTION EXPERIMENTS T HE Linear FOR THE CHURN All before the ¨  final 10%Lexicalthe time EntropyEATURES USEDRegression of the Period Cross-Ent INDICATORS OF LIFECYCLE TRAJECTORIES AREchurn prediction Exponential Decay Allics on USED TO correct. use cutoff points are: 2012-07-09 Comm’Facebook,Quad’ Regress’ EVOLUTION Fb, SAPTHE DIFFERENT USER We form for Cross-Ent 2010-05-11 2 CHARACTERISE USER a1 , afor ALONG ¨  a randomly sele sure Comm’ Cross-Ent Linear RegressionPROPERTIES . SAP, and 2010-12-23 for ServerFault. Our dataset is of the SFerty in isolation, for in oper to the probabil Property Indicator Modeland the entropy, period Feature(s) Platform following form: D = {(xi , yi )}, where yi denotes the class Linear Regression In-degree Period Entropy All (setting p =we |ch Period Cross-Ent 4 Exponential Decay All entropy trajectory indi the label of the user from one of two values: y Comm’ {0, 1}, Quad’ Regress’ a1 , a2 the receiver op 2 Cross-Ent All A. Prediction Model Definition Out-degree Period Entropy Linear Regression All model in confidence of a isolation, topfor while xi denotes an 11-element R-valued feature vector for Exponential Decay Period Cross-Ent All the Facebook, SAP: 11 features Comm’ All and examining in-degre observed userfeature ) contains ui to w settings of confi either a Facebook The SAP user,featurea vector of Period Cross-Ent (xiLinear Regression or features and 10-element Entropy Linear Regression Lexical All Server Fault: 10 Period along corr the indicator trajectories of we use a linear Exponential Decay finally thereby setting combined a vector for a Server Fault user - given that the user Cross-Ent the different, a2 we All Comm’ Cross-Ent Quad’ Regress’ a1 Fb, SAP properties. We use the logistic regression modelLinear predict In SF to Regression Comm’ Cross-Ent model. follows: soa ran doing to w regression model for each user’s lexical community crosst the conditional probability of user ui churning as follows: features maximum p (sett ¨  Induce entropy development. We model the feature vector of each coefficients via on prediction = f (x) the 1 Definition likelihood estimation selection for specific A. Prediction Model user using the trajectory indicators from |the)previous section, P r(Y = 1 xi = (9) |x confi i Probability of user churning 1+e model dif The where we place user ui (xi ) For each setti in short Table II defines our set of featuresobserved feature vector of performingcontains for F the indicator trajectoriesweight user As the used along different Mining User Lifecycles TheOnline Community Platforms and their)Application to Churn Predictionattached we positivethe therP from model’s coefficients ( define the of the log (T each within a set depending on the dynamics We captures. regression model to predictrate follo it use the logistic properties. to each identity trajectory feature within the linear model Predicting Churners Binary classification task: is user u a churner? Dataset churners: who last posted before final 10% Dataset attributes from trajectory model features Induced Logistic regression model: and from are diction model we these
  • 18. Evaluation: Setup 17 User-wise dataset split: 80% training, 20% testing ¨  Experiments: ¨  ¤  Isolated user properties, isolated development indicator features, all features together ¨  Evaluation measures: 1.  2.  ¨  Precision@k (P): Avg over k={1,5,10,20,50,100} Area Under the Receiver Operator Curve (AUC) Baseline: Success probability in single Bernoulli trial ¤  I.e. randomly selecting a churner Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 19. Table III ¯ ) AND A REA U NDER THE RECEIVER OPERATOR P RECISION @ K (P CHARACTERISTIC C URVE (AU C) VALUES FOR FACEBOOK , SAP AND S ERVER FAULT WHEN TESTING DIFFERENT: ( I ) USER PROPERTIES , ( II ) DEVELOPMENT INDICATORS , ( III ) ALL FEATURES TOGETHER . Evaluation: Results AU C) is preferable (thus achieving a value baseline for this measure is 0.5. 18 nts the performance of the different models ¨  Variance in features atforms, showing variation in the optimum ation measures. Interestingly, we find that depending on: ures combined together does not yield the ¤  Accuracy preference y of the tested platforms. For Facebook the hat the prediction model using community n  I.e. precision ¯ recall > icators performed best in terms of both P sted the difference between this model and ¤  Platform ming model (Full) using a Mann-Whitney n  Different detection he difference to be significant (at the 5% signals for different found differences in the communities best performing Platform Facebook SAP Server Fault Feature Entropy Period Cross Entropy Community Cross Entropy In-degree Out-degree Lexical Full Baseline Entropy Period Cross Entropy Community Cross Entropy In-degree Out-degree Lexical Full Baseline Entropy Period Cross Entropy Community Cross Entropy In-degree Out-degree Lexical Full Baseline to the evaluation measure used: in-degree ¯ exical features ¨  FullThese differences for P . model is never entrating on top ranks and thus informing the best ners with high-levels of confidence can be assessing the term distributions of users dynamics, while for preferring recall the Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction distributions is preferable. o Server Fault, the results also indicate ¯ P 0.761 0.624 0.791 0.648 0.781 0.681 0.730 0.629 0.434 0.321 0.334 0.351 0.250 0.438 0.363 0.342 0.392 0.300 0.352 0.232 0.293 0.459 0.421 0.319 AU C 0.500 0.485 0.617 0.511 0.570 0.557 0.573 0.500 0.549 0.568 0.549 0.592 0.503 0.539 0.539 0.500 0.526 0.555 0.538 0.475 0.512 0.546 0.554 0.500
  • 20. s are salient gh precision their in-degree distributions, and the extent to which they are contacted during one time period relative to their past communications, reduces at a much faster rate than on ServerFault. Evaluation: Churner Patterns Table IV 19 B EST PERFORMING PREDICTION MODEL COEFFICIENTS FOR FACEBOOK icting churn( COMMUNITY CROSS - ENTROPY ), SAP (I N - DEGREE ) AND S ERVER nspecting the Reduced quadratic coefficients: churnersLL FEATURES ARE SIGNIFICANT FAULT ( PERIOD CROSS - ENTROPY ). A exhibit steep . One of the WITHIN THEIR RESPECTIVE MODELS (↵ < 0.05) cross-community curves towards the end of their lifecycles as our churn Feature Facebook SAP Server Fault s that can be In-degree Entropy 0.0532 dual features In-degree Period Cross-Ent 0.0139 -0.1826 1 In-degree Comm’ Cross-Ent a -0.1057 -0.1878 y inspecting 2 In-degree Comm’ Cross-Ent a -0.0510 -1.5104 odel we can Out-degree Comm’ Cross-Ent 0.3173 Out-degree Period Cross-Ent 0.0210 ase/decrease) Lexical Period Cross-Ent Lexical Comm’ Cross-Ent a1 Lexical Comm’ Cross-Ent a2 0.3253 -0.0541 - 0.0557 - nts from the g the AU C, Variance in decay coefficient: degree of communication decays and SAP we VII. D ISCUSSION a lot faster forW ORK AND F UTURE SAP than Server Fault n model for distributions Prior work on social network evolution by Panzarasa et al. Mining [6] from Miritello et al. [1] their Application to Churn social has a vertexUser Lifecyclesand Online Community Platforms andfound that users’Prediction networks sed and that tend to a limit in terms of their communication capacity.
  • 21. Conclusions 20 1.  Users communicate with a fixed-set of users ¤  Similar 2.  to findings from (Miritello et al. 2013) Convergence-divergence effect: users converge on community ‘norms’ before diverging ¤  (Erikson. 1959) theorised that younger people are susceptible to social norms ¤  (Danescu-Niculescu-Mizil et al. 2013) found users to converge on lexical norms, before diverging 3.  Variance in churner signals ¤  No common best model was found across platforms Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 22. Current & Future Work 21 1.  Regularised Linear Models ¤  Achieved ~30% AUC boost with growth and magnitude that users tend to converge in their reviewing behaviour and features u,s,c that previous profiles allow one to gauge how the user will Dtrain ) (4) rate items in the future given their category information. u,s,c0 ng(Dtrain ) 2.  Conversely, for MovieLens and Movie Tweetings we see an opposite e↵ect: users’ taste profiles become less predictable ¤  Used lifecycle model (n=5) to form category-ratings profiles as they develop; users rate items in a way that renders unassess the relative certainty variance from previous information. n user and lifecycle ¤  Identified in profiling in taste evolution across platforms mapping function categories they are Dissimilarity categories ( g ) we in taste profile o di↵erent categorfrom previous the former profile profile gories, would lead ficity that the cat1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Lifecycle Stages Lifecycle Stages Lifecycle Stages type, formed from ries would lead to (a) Lens (b) Tweetings (c) Amazon uenced byMiningprior the User Lifecycles from Online Community Platforms and their Application to Churn Prediction thors consider only 0.220 0.290 0.275 ● ● 0.215 ● ● 0.210 ● ● Conditional Entropy 0.285 ● 0.205 ● 0.280 ● Conditional Entropy 0.235 0.245 ● ● 0.225 Conditional Entropy Evolving-Taste Recommender System ●
  • 23. 22 Questions? @mrowebot m.rowe@lancaster.ac.uk http://www.lancaster.ac.uk/staff/rowem/ Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction