Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Social Tagging Recommender Systems

Random Walk by User Trust and Temporal
Issues toward Sparsity Problem in Social
Tagging Recommender Systems
20130513
Speaker: Yan Kai Huang
NTU Internet Research Lab
1

Outline
• Introduction
• Related Works
• Cold Start:
– Random Walk and Probability Assignment
• Algorithms
• Temporal Decay Issues
• Credibility & Accordance
• Experiment Design
• Discussion and Conclusion
2

3
Introduction to
Recommender Systems
• Recommendation systems (RS) help to match users
with items
– Ease information overload
– Sales assistance (guidance, advisory, persuasion,…)
• Collaborative Filtering
– Considers Users with Similar Rating Patterns
– Aggregates the ratings of Similar Users
• Social Networks Emerged Recently
– Independent source of information
• Motivations of Trust-based RS
– Social Influence: users adopt the behavior of their
friends

Motivation
• User generated data obtained by predefined
website.
– instead of random graph generator
– e.g. ER model, BA model, WS model… etc.
– Unable to generate uni-partite,
not to mentioned bipartite.
• “Knowledge discovery”
– What is the characteristics of user-action data?
– What can be attributed into pragmatic
applications?
– Data-proven reliability.
4

Preliminaries
• Recommender system assumes:
– A set of users, U = {u1, u2…un}
– A set of items, I = {i1, i2… im}
– Each user u do actions for a set of items:
Iu = {iu1, iu2… iuk}
– The action of user u on item i is denoted by Au,I
5

Preliminaries: Trust Network
• Additionally, there is a trust network among
users in trust-based system:
tu,v ∈ Tu: a real number in [0,1] denotes u trust v .
• The trust network can be represented as a
directed graph G = <U, T>
• T={ (u, v) | u ∈ U, v ∈ Tu}
6

Network Model of User Trust and
Actions
User Layer
Item Layer
7

Recommendation :
Collaborative Filtering for Rating Value
• Common task of recommendation:
– Given an user u∈U and an item i ∈ I
– For an unknown action, predict action value (rating
stars in [0,5]) for user u on item i.
• Is “value prediction” what the user want?
 Tractable to compare and optimize.
 NOT practical and user-friendly
 Serendipity
8

Problem Definition -
Top-N Item Recommendation
• Given a target user u
• recommend a set of items Îu where | Îu | < N
and Îu ∩ Îu= Ø
– Once produced, the rank within set does NOT
matter anymore.
• Verify whether the testing item îu is
contained in the resulting item set.
9

Outline
• Introduction
• Related Works
– Itembased CF
– RandomWalk Recommendation
– TrustWalker
– Influence Probabilities
• Cold Start Problem
• Random Walk and Probability Assignment
• Algorithm
10

Related Work – Item-based CF
• By similarity between items or users
• Simply predict by weighted sum of similar
items. (ex: 5*0.2+4*0.3+3*0.5 = 3.7)
• Take the highest rating n items as the top-N
11
[1] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
collaborative filtering recommendation algorithms. In Proceedings of the 10th international
conference on World Wide Web (WWW '01).

Related Works –
Random Walk Recommendation
12
[3] Yildirim, Hilmi, and Mukkai S. Krishnamoorthy. "A random walk method for alleviating
the sparsity problem in collaborative filtering." Proceedings of the 2008 ACM conference on
Recommender systems. ACM, 2008.

Random Walk Recommendation
– Three components
1. Building the item graph which captures the similarity of
items between each other
2. The second component computes the rank values of items
for each user by simulating a random walk
3. Finally the last component interprets and scales the rank
scores as ratings for each user-item pair.
13

Related Works - TrustWalker
• Combined user-based recommendation and item-based and waiting for
random walk variance converge.
• Starts from Source user u0, at step k, at node u:
– If u has rated i, return ru,i
– With Φu,i,k , the random walk stops
• Randomly select item j rated by u and return ru,j .
– With 1- Φu,i,k , continue the random walk to a direct neighbor of u.
• Three way to stop:
1. Reaching a node uk who has expressed an action on item i
2. Decide to stay at the user uk and select one of the items i rated by uk
3. Define max-depth = 6 (by “six-degrees of separation”)
14
[5] Mohsen Jamali and Martin Ester. 2009. ”TrustWalker: a random walk model for
combining trust-based and item-based recommendation.” In Proceedings of the 15th ACM
SIGKDD international conference on Knowledge discovery and data mining (KDD '09).
[4] Mohsen Jamali and Martin Ester. "Using a trust network to improve top-N
recommendation." Proceedings of the third ACM conference on Recommender systems. ACM, 2009.

Related Works-
Influence Probabilities
• Toward Influence Maximization problem
– To find the influence between each user pair.
– Assume influence probabilities do NOT remain
constant independently of time?
Exponential Decay
• Dataset Difference
– Yahoo! Flickr dataset
– “Joining a group”(?!) is considered as action
– User “James” joined “Whistler Mountains” at
timestamp 5.
15
[6] Goyal, Amit, Francesco Bonchi, and Laks VS Lakshmanan. "Learning influence
probabilities in social networks." Proceedings of the third ACM international conference
on Web search and data mining. ACM, 2010.

Learning Influence Probabilities
the Models
• Parameters to learn:
– #actions performed by each user – Au
– #actions propagated via each edge–
Av2u
– Mean life time –
P a1 5
Q a1 10
R a1 15
Q a2 12
R a2 14
R a3 6
P a3 14
u Au
P
Q
R
P Q R
P X
Q 0,0 X
R 0,0 X
01
01
01
0,01,5 0,01,10
2
2
0,01,2
3
2
0,01,8
uv,
uv,uv, ,A 
16
[6] Goyal, Amit, Francesco Bonchi, and Laks VS Lakshmanan. "Learning influence
Influence
Models
Q R
P
0.33
0
0
0.5
0.5
0.2

Outline
• Introduction
• Related Works
• Cold Start:
• Algorithm
17

Cold Start Problem
• Similarity matrices are usually too sparse to capture actual dependencies
between items.
– item i that hasn’t been rated by any user who has rated item j : similarity score of 0
– However these items would be found as closely to each other, if another item t is
similar to both items.
• Random Walk Recommender captures these transitive associations in
various levels proportional to the length of the random walk.
• Parameterize the length of the walk according to the sparsity level of the rating
matrix by continue probability (typically 0.8~0.85)
• Cold Start User
User with few action and plenty relation/friend
User with plenty action and few relation/friend
• Not cold user -> traditional CF works best!
 New Comer with few action and few relation
• Use sigmoid function and alpha to beverage ratio between user-similarity and
social influence
18

Consider the State-of-the-art
Recommendation
• Matrix Factorization method[2] still dominates
if you only concern about the value accuracy:
 Highly effective: learning by training dataset
 Low efficiency: high complexity and memory costs
 Without quality indicator and source explain-ability
 “Latent” is scanty of physical meaning
 Centralized information is needed.
• Network Method
 based-on neighborhood similarity: distributed
 Random work with lower complexity
 Feasible to update immediately

User Similarity v.s. Node Distance
• Uni-partite previous
• Katz centrality with penalty beta
• Similar to pageRank
21

Outline
• Introduction
• Related Works
• Cold Start:
• Algorithms
– Item-based Random Walk
– User-based Random Walk
– Influence-based Random Walk
22

Item-Based Random Walk (ItemRW)
• Construct item-based similarity matrix
– By Jaccard index
• the Random walk process:
– Denote Yu,i the random variable for selecting item j
amongst items rated by u for an item similar to i.
– General by Sigmoid Function, where exp as
common neighbor
Liben‐Nowell, David, and Jon Kleinberg. "The link‐prediction problem for social
networks." Journal of the American society for information science and
technology 58.7 (2007): 1019-1031. 23

User-Based Random Walk (UserRW)
• Construct user-based similarity
• the Random walk process:
– Denote Xu,i the random variable for selecting
user v amongst all v for an user similar to u.
– Pick a nearest neighbor and output the action
set of v.
24

Influence-Based Random Walk
Algorithm
1. Build the item and user graph with correlation
2. learn influence power by parsing the trust corpus
3. Perform random walk on the graph to get rank list.
• To perform a random walk, we can acquire needed
information by user request distributedly.
• To validate the algorithm, we compute the expected value and
sort the state probabilities of each items.
– Most of them remain 0 -> no need to parse full item vector I to perform
matrix operation
25
Build Graph
Learning
Influence
updated
Random
Walk to
produce
rank list

Learning Influence - Graph
User Layer
Item Layer
Goyal, Amit, Francesco Bonchi, and Laks VS Lakshmanan. "Learning influence
Influence Power △t = (t2-t1)
{u1, i1, t1} {u2, i1, t2}
u1 take action i1
at timestamp t1
26

Sigmoid Smoothing
• Adjust the weight for fewer related items
• A sigmoid function is a mathematical
function having an "S" shape (sigmoid
curve). Often, sigmoid function refers to the
special case of the logistic function and
defined by the formula

Influence-Based
Random Walk
u1
u2
u
u4
u5
Item
i
Φu1,i
item
j
Φu2,i
Φu4,i
Φu5,i
simi,j

• Influence-based User Random Walk
Probability:
α*user-similarity(u,v) + (1-α)*Influence Power(u,v)
User Layer
Yildirim, Hilmi, and Mukkai S. Krishnamoorthy. "A random walk method for alleviating
the sparsity problem in collaborative filtering." Proceedings of the 2008 ACM
conference on Recommender systems. ACM, 2008.
Influence-based
User Transition Probabilities

Outline
• Introduction
• Related Works
• Cold Start:
• Algorithms
– ItemBetw
– PastDecay
31

Exponential Time Decay Function
32
Dunlavy, Daniel M., Tamara G. Kolda, and Evrim Acar. "Temporal link prediction
using matrix and tensor factorizations." ACM Transactions on Knowledge Discovery
from Data (TKDD) 5.2 (2011): 10.

Time Interval Analysis
– ItemBetw
User Layer
Item Layer
{u, i1, t1}
{u, i2, t2}
User u take action
i1 at timestamp t1
i2 at timestamp t2
33

Time Interval – ItemBetw
• By assumption:
items which users took action on it in short
interval gains higher similarity
“you all items are my favorite of past…”
Where items which user take action during long
timeslot will become close to 0
34

Time Interval Analysis –
PastDecay
User Layer
Item Layer
{u, i, t1}
{u, j, t2}
User u take action
i at timestamp t1
j at timestamp t2
k at timestamp t3
{u, k, t3}
35

Time Interval - PastDecay
• By assumption:
items which users action it in short time before
now gains higher similarity
The newer, the better !!!
Where items which user take action during long
time interval will be close to 0
36

Outline
• Introduction
• Related Works
• Cold Start:
• Algorithms
37

Credibility & Accordance
• What is the evidently to examine recommendation
quality of algorithm?
– The ranking of testing item in our rank list!
– For the best case: rank = 1, presented by avg. percentage:
Rank 3 out of top-15 => credibility u,i =20%
– Metrics to a Recommender System/Method
• Select the highest Probability of related item/user
as the reference
38

Outline
• Introduction
• Related Works
• Cold Start:
• Algorithms
– ALL-BUT-ONE Evaluation
– Dataset Description
– Experimental Result
39

ALL-BUT-ONE Evaluation
• Also called “leave-one-out” method
• Predict the last item i target user u took
• Output top-N, if the action items is contained, calls
a HIT
Item Layer
{u, i1, t1} {u, i2, t2}
{u, i3, t3}
40
{u, i?, tlast}
, L to be the testing set size.

Dataset of Experiment
41
• bookmark data
• 68,215 bookmark URLs from 1,867 users
• friendship “become mutual fans” with timestamp
information
<source_user, target_user, timestamp>
• Action also with timestamp to measure the interval
influence.
<user, item, timestamp>

Dataset Description
• Social degree of node (trust) conforms
power-law distribution.
42
0
10
20
30
40
50
60
70
80
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89
User Action Times Distribution
#(Itemto
Tag)
(#Days
)
0
100
200
300
400
500
600
700
800
900
10 20 30 40 50 60 70 80 90 100110120
USER DEGREE OF TRUST
DISTRIBUTION
(#Social Degree)
(#User
s)

Experiment - Learning Influence
• User-Based Similarity:
– Average correlation by Jaccard index
– 2.58%
– Average correlation in mutual trust
– 8.28% (4 times as average!)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0.001
0.011
0.021
0.031
0.041
0.051
0.061
0.071
0.081
0.091
0.101
0.111
0.121
0.131
0.141
0.151
0.161
0.171
0.181
0.191
User-based Similarity
0.001
0.01
0.019
0.028
0.037
0.046
0.055
0.064
0.073
0.082
0.091
0.1
0.109
0.118
0.127
0.136
0.145
0.154
0.163
0.172
0.181
0.19
0.199
User-based Similarity – with
Mutual Trust
43

Experimental Result
44
0
5
10
15
20
25
30
10 20 30 40 50 60 70 80 90 100
RECALL AND TOP-K SIZE
MIN_ITEM_FOR_USER>5
user-based influence based
itemBased itemEnhanced
relational popular
0
5
10
15
20
25
10 20 30 40 50 60 70 80 90 100
RECALL AND TOP-K SIZE
MIN_ITEM_FOR_USER > 1
userbased influenceBased
itemBased itemEnhanced
Trustwalker popular

Result for Cold Start User
0
5
10
15
20
25
30
35
40
45
Recall for Cold Start User
with action-item <10
item-based RW
user-based RW
Influence
itemAdjust
TrustWalker
Hit
Radio(%)
10 20 30 40 50 60 70 80 90 100

Results for ratio of
Global/Friendship Ratio α
α*user-similarity(u,v) + (1-α)*Influence Power(u,v)
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6 7 8 9 10
Recall for Cold Start User
with Action-item <10
alpha = 0.1 alpha =0.01
alpha =0.001 alpha =0.0001
Hit
Radio(%)

Time Interval Decay Result -
ItemBetw
• Set decay function as constant = 1 gain the
best performance!
0
5
10
15
20
25
30
1.5 1.1 1.05 1.01 1 0.99 0.95 0.9 0.8 0.7
TIME ITEMBETW TOP-K
CURVE
47
Hit
Radio(%)
0
5
10
15
20
25
30
10 30 50 70 90
Time ItemBetw Top-K Curve
1.5
1.1
1.05
1.01
1
0.99
0.95
0.99
0.95
0.7

Time Interval Decay Result -
PastDecay
0
5
10
15
20
25
10 20 30 40 50 60 70 80 90 100
TIMEDECAY
1.5 1.1 1.05 1.01 1
0.99 0.95 0.9 0.8
0
5
10
15
20
25
1.5
1.1
1.05
1.01
1
0.99
0.95
0.9
0.8
0.7
TimeDecay Top-k Curve
48

Outline
• Introduction
• Related Works
• Cold Start:
• Algorithms
49

Discussion - Why TrustWalker Fails?
• TrustWalker puts more emphasis on the local
trusted user instead of global similar user.
• Minimize the Mean Square Error :
– Similar to Non-personalized Popular List
• As mentioned, top-N result is more user-friendly
50
TrustWalker Experiment on
dataset: Epinion
Become a fans of experts and
Columnists
Trust > Global similarity

Discussion:
Influence Based Random Walk
51
• For α is near to 0.001
– In the different scale of user similarit
• Like Decision tree:
– Similarity would be the primary and
Influence power are the secondary
Comparison metrics sim(u,v)
Influence Power(u,v)

Discussion: Time Interval Decay
52
• Achieve peak when all the data remain the same
weight in the time issue.
• “In predefined dataset, you should not easily
abandon or under estimate value of old data.”

Conclusion
• Propose novel method by influence.
– Influence-based Random Walk
– Intersection with item and user
• Probe and leverage influence probabilities and
user correlation for cold start user
• Provide creditability and Accordance for user
experience and feedback in RS
• Analyze the time decay function by 2 decay
function
– PastDecay
– Itembetw
53

Q&A
Thanks for Your Attention!
54

Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Social Tagging Recommender Systems

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Social Tagging Recommender Systems

Similaire à Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Social Tagging Recommender Systems (20)

Dernier

Dernier (20)

Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Social Tagging Recommender Systems