Random Walk by User Trust and Temporal Issues toward Sparsity Problem in Social Tagging Recommender Systems
1. Random Walk by User Trust and Temporal
Issues toward Sparsity Problem in Social
Tagging Recommender Systems
20130513
Speaker: Yan Kai Huang
NTU Internet Research Lab
1
2. Outline
• Introduction
• Related Works
• Cold Start:
– Random Walk and Probability Assignment
• Algorithms
• Temporal Decay Issues
• Credibility & Accordance
• Experiment Design
• Discussion and Conclusion
2
3. 3
Introduction to
Recommender Systems
• Recommendation systems (RS) help to match users
with items
– Ease information overload
– Sales assistance (guidance, advisory, persuasion,…)
• Collaborative Filtering
– Considers Users with Similar Rating Patterns
– Aggregates the ratings of Similar Users
• Social Networks Emerged Recently
– Independent source of information
• Motivations of Trust-based RS
– Social Influence: users adopt the behavior of their
friends
4. Motivation
• User generated data obtained by predefined
website.
– instead of random graph generator
– e.g. ER model, BA model, WS model… etc.
– Unable to generate uni-partite,
not to mentioned bipartite.
• “Knowledge discovery”
– What is the characteristics of user-action data?
– What can be attributed into pragmatic
applications?
– Data-proven reliability.
4
5. Preliminaries
• Recommender system assumes:
– A set of users, U = {u1, u2…un}
– A set of items, I = {i1, i2… im}
– Each user u do actions for a set of items:
Iu = {iu1, iu2… iuk}
– The action of user u on item i is denoted by Au,I
5
6. Preliminaries: Trust Network
• Additionally, there is a trust network among
users in trust-based system:
tu,v ∈ Tu: a real number in [0,1] denotes u trust v .
• The trust network can be represented as a
directed graph G = <U, T>
• T={ (u, v) | u ∈ U, v ∈ Tu}
6
8. Recommendation :
Collaborative Filtering for Rating Value
• Common task of recommendation:
– Given an user u∈U and an item i ∈ I
– For an unknown action, predict action value (rating
stars in [0,5]) for user u on item i.
• Is “value prediction” what the user want?
Tractable to compare and optimize.
NOT practical and user-friendly
Serendipity
8
9. Problem Definition -
Top-N Item Recommendation
• Given a target user u
• recommend a set of items Îu where | Îu | < N
and Îu ∩ Îu= Ø
– Once produced, the rank within set does NOT
matter anymore.
• Verify whether the testing item îu is
contained in the resulting item set.
9
10. Outline
• Introduction
• Related Works
– Itembased CF
– RandomWalk Recommendation
– TrustWalker
– Influence Probabilities
• Cold Start Problem
• Random Walk and Probability Assignment
• Algorithm
• Temporal Decay Issues
• Credibility & Accordance
• Experiment Design
• Discussion and Conclusion
10
11. Related Work – Item-based CF
• By similarity between items or users
• Simply predict by weighted sum of similar
items. (ex: 5*0.2+4*0.3+3*0.5 = 3.7)
• Take the highest rating n items as the top-N
11
[1] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
collaborative filtering recommendation algorithms. In Proceedings of the 10th international
conference on World Wide Web (WWW '01).
12. Related Works –
Random Walk Recommendation
12
[3] Yildirim, Hilmi, and Mukkai S. Krishnamoorthy. "A random walk method for alleviating
the sparsity problem in collaborative filtering." Proceedings of the 2008 ACM conference on
Recommender systems. ACM, 2008.
13. Random Walk Recommendation
– Three components
1. Building the item graph which captures the similarity of
items between each other
2. The second component computes the rank values of items
for each user by simulating a random walk
3. Finally the last component interprets and scales the rank
scores as ratings for each user-item pair.
13
14. Related Works - TrustWalker
• Combined user-based recommendation and item-based and waiting for
random walk variance converge.
• Starts from Source user u0, at step k, at node u:
– If u has rated i, return ru,i
– With Φu,i,k , the random walk stops
• Randomly select item j rated by u and return ru,j .
– With 1- Φu,i,k , continue the random walk to a direct neighbor of u.
• Three way to stop:
1. Reaching a node uk who has expressed an action on item i
2. Decide to stay at the user uk and select one of the items i rated by uk
3. Define max-depth = 6 (by “six-degrees of separation”)
14
[5] Mohsen Jamali and Martin Ester. 2009. ”TrustWalker: a random walk model for
combining trust-based and item-based recommendation.” In Proceedings of the 15th ACM
SIGKDD international conference on Knowledge discovery and data mining (KDD '09).
[4] Mohsen Jamali and Martin Ester. "Using a trust network to improve top-N
recommendation." Proceedings of the third ACM conference on Recommender systems. ACM, 2009.
15. Related Works-
Influence Probabilities
• Toward Influence Maximization problem
– To find the influence between each user pair.
– Assume influence probabilities do NOT remain
constant independently of time?
Exponential Decay
• Dataset Difference
– Yahoo! Flickr dataset
– “Joining a group”(?!) is considered as action
– User “James” joined “Whistler Mountains” at
timestamp 5.
15
[6] Goyal, Amit, Francesco Bonchi, and Laks VS Lakshmanan. "Learning influence
probabilities in social networks." Proceedings of the third ACM international conference
on Web search and data mining. ACM, 2010.
16. Learning Influence Probabilities
the Models
• Parameters to learn:
– #actions performed by each user – Au
– #actions propagated via each edge–
Av2u
– Mean life time –
P a1 5
Q a1 10
R a1 15
Q a2 12
R a2 14
R a3 6
P a3 14
u Au
P
Q
R
P Q R
P X
Q 0,0 X
R 0,0 X
01
01
01
0,01,5 0,01,10
2
2
0,01,2
3
2
0,01,8
uv,
uv,uv, ,A
16
[6] Goyal, Amit, Francesco Bonchi, and Laks VS Lakshmanan. "Learning influence
probabilities in social networks." Proceedings of the third ACM international conference
on Web search and data mining. ACM, 2010.
Influence
Models
Q R
P
0.33
0
0
0.5
0.5
0.2
17. Outline
• Introduction
• Related Works
• Cold Start:
– Random Walk and Probability Assignment
• Algorithm
• Temporal Decay Issues
• Credibility & Accordance
• Experiment Design
• Discussion and Conclusion
17
18. Cold Start Problem
• Similarity matrices are usually too sparse to capture actual dependencies
between items.
– item i that hasn’t been rated by any user who has rated item j : similarity score of 0
– However these items would be found as closely to each other, if another item t is
similar to both items.
• Random Walk Recommender captures these transitive associations in
various levels proportional to the length of the random walk.
• Parameterize the length of the walk according to the sparsity level of the rating
matrix by continue probability (typically 0.8~0.85)
• Cold Start User
User with few action and plenty relation/friend
User with plenty action and few relation/friend
• Not cold user -> traditional CF works best!
New Comer with few action and few relation
• Use sigmoid function and alpha to beverage ratio between user-similarity and
social influence
18
19. Consider the State-of-the-art
Recommendation
• Matrix Factorization method[2] still dominates
if you only concern about the value accuracy:
Highly effective: learning by training dataset
Low efficiency: high complexity and memory costs
Without quality indicator and source explain-ability
“Latent” is scanty of physical meaning
Centralized information is needed.
• Network Method
based-on neighborhood similarity: distributed
Random work with lower complexity
Feasible to update immediately
21. User Similarity v.s. Node Distance
• Uni-partite previous
• Katz centrality with penalty beta
• Similar to pageRank
21
22. Outline
• Introduction
• Related Works
• Cold Start:
– Random Walk and Probability Assignment
• Algorithms
– Item-based Random Walk
– User-based Random Walk
– Influence-based Random Walk
• Temporal Decay Issues
• Credibility & Accordance
• Experiment Design
• Discussion and Conclusion
22
23. Item-Based Random Walk (ItemRW)
• Construct item-based similarity matrix
– By Jaccard index
• the Random walk process:
– Denote Yu,i the random variable for selecting item j
amongst items rated by u for an item similar to i.
– General by Sigmoid Function, where exp as
common neighbor
Liben‐Nowell, David, and Jon Kleinberg. "The link‐prediction problem for social
networks." Journal of the American society for information science and
technology 58.7 (2007): 1019-1031. 23
24. User-Based Random Walk (UserRW)
• Construct user-based similarity
• the Random walk process:
– Denote Xu,i the random variable for selecting
user v amongst all v for an user similar to u.
– Pick a nearest neighbor and output the action
set of v.
24
25. Influence-Based Random Walk
Algorithm
1. Build the item and user graph with correlation
2. learn influence power by parsing the trust corpus
3. Perform random walk on the graph to get rank list.
• To perform a random walk, we can acquire needed
information by user request distributedly.
• To validate the algorithm, we compute the expected value and
sort the state probabilities of each items.
– Most of them remain 0 -> no need to parse full item vector I to perform
matrix operation
25
Build Graph
Learning
Influence
updated
Random
Walk to
produce
rank list
26. Learning Influence - Graph
User Layer
Item Layer
Goyal, Amit, Francesco Bonchi, and Laks VS Lakshmanan. "Learning influence
probabilities in social networks." Proceedings of the third ACM international conference
on Web search and data mining. ACM, 2010.
Influence Power △t = (t2-t1)
{u1, i1, t1} {u2, i1, t2}
u1 take action i1
at timestamp t1
26
28. Sigmoid Smoothing
• Adjust the weight for fewer related items
• A sigmoid function is a mathematical
function having an "S" shape (sigmoid
curve). Often, sigmoid function refers to the
special case of the logistic function and
defined by the formula
30. • Influence-based User Random Walk
Probability:
α*user-similarity(u,v) + (1-α)*Influence Power(u,v)
User Layer
Yildirim, Hilmi, and Mukkai S. Krishnamoorthy. "A random walk method for alleviating
the sparsity problem in collaborative filtering." Proceedings of the 2008 ACM
conference on Recommender systems. ACM, 2008.
Influence-based
User Transition Probabilities
31. Outline
• Introduction
• Related Works
• Cold Start:
– Random Walk and Probability Assignment
• Algorithms
• Temporal Decay Issues
– ItemBetw
– PastDecay
• Credibility & Accordance
• Experiment Design
• Discussion and Conclusion
31
32. Exponential Time Decay Function
32
Dunlavy, Daniel M., Tamara G. Kolda, and Evrim Acar. "Temporal link prediction
using matrix and tensor factorizations." ACM Transactions on Knowledge Discovery
from Data (TKDD) 5.2 (2011): 10.
33. Time Interval Analysis
– ItemBetw
User Layer
Item Layer
{u, i1, t1}
{u, i2, t2}
User u take action
i1 at timestamp t1
i2 at timestamp t2
33
34. Time Interval – ItemBetw
• By assumption:
items which users took action on it in short
interval gains higher similarity
“you all items are my favorite of past…”
Where items which user take action during long
timeslot will become close to 0
34
35. Time Interval Analysis –
PastDecay
User Layer
Item Layer
{u, i, t1}
{u, j, t2}
User u take action
i at timestamp t1
j at timestamp t2
k at timestamp t3
{u, k, t3}
35
36. Time Interval - PastDecay
• By assumption:
items which users action it in short time before
now gains higher similarity
The newer, the better !!!
Where items which user take action during long
time interval will be close to 0
36
37. Outline
• Introduction
• Related Works
• Cold Start:
– Random Walk and Probability Assignment
• Algorithms
• Temporal Decay Issues
• Credibility & Accordance
• Experiment Design
• Discussion and Conclusion
37
38. Credibility & Accordance
• What is the evidently to examine recommendation
quality of algorithm?
– The ranking of testing item in our rank list!
– For the best case: rank = 1, presented by avg. percentage:
Rank 3 out of top-15 => credibility u,i =20%
– Metrics to a Recommender System/Method
• Select the highest Probability of related item/user
as the reference
38
39. Outline
• Introduction
• Related Works
• Cold Start:
– Random Walk and Probability Assignment
• Algorithms
• Temporal Decay Issues
• Credibility & Accordance
• Experiment Design
– ALL-BUT-ONE Evaluation
– Dataset Description
– Experimental Result
• Discussion and Conclusion
39
40. ALL-BUT-ONE Evaluation
• Also called “leave-one-out” method
• Predict the last item i target user u took
• Output top-N, if the action items is contained, calls
a HIT
Item Layer
{u, i1, t1} {u, i2, t2}
{u, i3, t3}
40
{u, i?, tlast}
, L to be the testing set size.
41. Dataset of Experiment
41
• bookmark data
• 68,215 bookmark URLs from 1,867 users
• friendship “become mutual fans” with timestamp
information
<source_user, target_user, timestamp>
• Action also with timestamp to measure the interval
influence.
<user, item, timestamp>
49. Outline
• Introduction
• Related Works
• Cold Start:
– Random Walk and Probability Assignment
• Algorithms
• Temporal Decay Issues
• Credibility & Accordance
• Experiment Design
• Discussion and Conclusion
49
50. Discussion - Why TrustWalker Fails?
• TrustWalker puts more emphasis on the local
trusted user instead of global similar user.
• Minimize the Mean Square Error :
– Similar to Non-personalized Popular List
• As mentioned, top-N result is more user-friendly
50
TrustWalker Experiment on
dataset: Epinion
Become a fans of experts and
Columnists
Trust > Global similarity
51. Discussion:
Influence Based Random Walk
51
• For α is near to 0.001
– In the different scale of user similarit
• Like Decision tree:
– Similarity would be the primary and
Influence power are the secondary
Comparison metrics sim(u,v)
Influence Power(u,v)
52. Discussion: Time Interval Decay
52
• Achieve peak when all the data remain the same
weight in the time issue.
• “In predefined dataset, you should not easily
abandon or under estimate value of old data.”
53. Conclusion
• Propose novel method by influence.
– Influence-based Random Walk
– Intersection with item and user
• Probe and leverage influence probabilities and
user correlation for cold start user
• Provide creditability and Accordance for user
experience and feedback in RS
• Analyze the time decay function by 2 decay
function
– PastDecay
– Itembetw
53