Using Trust in Recommender Systems: an experimental analysis
1. Using Trust
in Recommender Systems:
an experimental analysis
Paolo Massa
University of Trento
(joint work with Bobby Bhattacharjee, UMD)
2. Motivation:
1. Recommender Systems recommends
items the user might like, based on
past ratings.
2. Now, Decentralized publishing of info:
– Ratings on Items
– Trust on Principals
[Semantic Web]
3. New issues (sparseness, scalability,
trust, attacks, ...)
... Trust-aware Decentralized RS
3. Summary
1. Recommender Systems (RSs)
– Weaknesses
2. Solution: trust-awareness
– Trust and trust metrics
3. Experiments on Epinions.com
– Evidence trust solves RSs problems
– (~50.000 users!)
4. Future works
4. Collaborative Filtering (CF)
1. Input: ratings given by users to items
● I like “ Titanic” as 4/5
2. I ask recommendation
3. RS computes the similarity of me
against every other user
● Pearson correlation coefficient
4. RS find similar users and suggests to
me items liked by them.
5. Item 1
It e m 2
It e m 3
It e m 4
I
User1 2 5 ? 5
2 5 5 5
User2 5 1 3
User3 5 5 1
User4 2
2 5
5 5
5 4
4
It does not consider the content of the items, only
the ratings given by users.
It works independently of the domain (also jokes)
BUT
Overlapping of rated items required!
6. RSs weaknesses
1. Ratings Matrix sparseness (95-99%)
– Low or No overlapping (users not comparable)
2. Cold start
– New users have 0 ratings (->not comparable)
3. Easy Attacks by Malicious Users
– Copy profile and become the most similar
4. Hard to understand and control
– Black box (bad recs -> user gives up)
Solution? Trust of course!
7. Trust-awareness
1. Trust statement =Rating by human to
human about her usefulness (ex: in
providing good movie reviews)
2. Explicitly provided
3. Trust is subjective! T(A,Z)=1 & T(B,Z)=0
– No Global BAD principals!!!
4. Trust is asymmetric! I trust Bill Gates.
5. FOAF (Friend-Of-A-Friend) is an XML
format to express relationships
– Some millions files out there...
9. Trust metrics
1. Task: based on known trust edges,
predict trustworthiness of principals
2. Trust propagation (A->B,B->C|A-?->C)
3. Global (pagerank, ebay, ...)
4. Local (personalized)
ME
10. Trust solves RS problems
1. Trust solves CF sparseness problem
– trust propagation and “ 6 degrees” -> reach many
2. Trust solves Cold Start problem
– “ just add 1 friend”
3. Trust metrics resistant to copy-profile-attack.
– “ you can be similar but if no trust path to you ...”
4. Trust easier to understand and control
– trust nets supports Explanation (HCI tests needed)
EVIDENCE of 1 and 2 provided by analyzing a
REAL, VAST community (Epinions.com)
11. Experiment: Epinions.com
1. Epinions.com' users can
– Review and rate items (from 1 to 5)
– Keep web of trust (trust=1) and block list (trust=0).
– “ Reviewers whose reviews and ratings you have
consistently found to be valuable” (Epinions FAQ)
2. Dataset (by crawling site):
– ~50K users, ~140K items, ~660K ratings.
– ~500K trust statements.
• No block list (not shown on site)
12. Epinions' recommendations
Taken one user “ ME” , we can
- use CF on ratings and compute
“ similarity” of other users
- use Trust Metric and compute
“ trustworthiness” of other users
Then we can suggest items liked by similar
or trustable users.
On how many users are they
computable?
15. User Similarity Computability
1. Ideally, every user should be
comparable against every other user.
2. BUT ratings sparseness = 99.99135%
-> tiny overlapping between 2 users
3. Pearson correlation coefficient
meaningful only if overlapping(A,U)>1
4. Question: taken one user, how many
users are comparable?
16. US computability (cont.)
1. Taken one user, we computed all the
comparable users.
– On average an user has 161 comparable
users (ideally ~50.000!)
2. We have averaged
#comparable_users over users who
expressed a certain number of
reviews.
17. US computability (cont.)
Cold Start Users
Ex: users with 40 reviews have ~800 comparable users.
BUT users (y axis) are ~50.000!
And for Cold Start Users (>50%) this is 2.74
18. Trust computability
1. Trust metrics predict trust in unknown
users based on known trust
statements.
2. Distance from ME to U is a first
measure of Trust computability
3. On average,
– In 2 steps, reach 400 users
– In 3 steps, reach 4386 users
19. Mean # Reachable Users (in k steps) for users
expressing X trust statements
In few steps, you can predict trust in every user!
Even for Cold Start Users!!!
20. Trust and US computability
comparison
Mean number of Comparable Mean number of Comparable
users for All users users for Cold Start users
Propagating Trust Using Propagating Trust Using
Dist 1 Dist 2 Dist 3 Dist 4 Pearson Dist 1 Dist 2 Dist 3 Dist 4 Pearson
9.88 400 4386 16334 161 2.14 94.54 1675 9121 2.74
21. Contribution
Experimental evidence that
– CF is ineffective in real world scenarios
• Especially for Cold Start users.
– Trust can solve CF problems
• Sparseness
• Cold Start
• Attacks (self-evident)
Trust is computable on many more users than
user similarity
Especially for cold start users (the majority!)
22. Future works
1. US and Trust correlate? Contradict?
– US over trusted is higher than usual?
2. Distrust?
– Propagation? Properties?
3. Design a Trust Metric (for RS)
– Create and evaluate a Trust-aware RS
• Input data
23. Thanks for your attention!
Questions?
Paolo Massa
Email: massa@itc.it
Blog: http://moloko.itc.it/paoloblog/index.html
24. Collaborative Filtering
Similarity measure: Pearson Correlation
Coefficient of user a and u
m
∑i=1 r a ,i −r a r u ,i −r u
w a , u=
∑ m
i=1
r a , i −r u
2 m
∑i=1 r u , i −r u
2
Prediction of rating given by user a to ite
n
∑u=1 r u , i −ru ∗w a , u
p a , i =r a
n
∑u=1 w a , u
25. Hard Trust and Soft Trust
1. Vocabulary:
– Hard Trust: about security, identity of
something (user, device, information)
• Public key cryptography
– Soft Trust: appreciation of some principal
(explicitly provided by another principal)
• Social Networks and Trust Metrics