2. OVERVIEW
● Introduction
● Task Definition
● Schema
● Combating web spam with TrustRank.
● Propagation of Trust and Distrust
● The EigenTrust algorithm for reputation management in p2p networks
● Attack-Resistant Trust Metrics for Public Key Certification
● Dataset Suggestions
● Conclusion
3. Introduction
● Given the open nature of social networks and their current level of popularity,
users are increasingly concerned about privacy and security;
● We need to trust the entities that belong to our social network;
● To achieve that, a “Web of Trust” should be introduced;
● In order to balance the open nature of social networks and safeguard the
privacy concerns of users, it is important to build “Trust Communities”.
4. Task Definition [2]
Challenges
Users sometimes adopt
many personas and
express a large number
of biased opinions.
Difficulty in defining trust.
Importance
On e-commerce, a trust
model can increase the
value of a product.
Trusted users will have
greater influence and
perks; that can lead to
positive effect on user
behaviour.
Better recommendations.
Applications
Internet Networks:
● Social networks
● P2P networks
● Certificate networks
● Mail networks
5. Schema
Web of Trust
Case Studies
EigenTrust on P2P Digital Certificates
Methodologies
TrustRank Trust & Distrust
Case Studies
6. OVERVIEW
● Introduction
● Task Definition
● Schema
● Combating web spam with TrustRank
● Propagation of Trust and Distrust
● The EigenTrust algorithm for reputation management in p2p networks
● Attack-Resistant Trust Metrics for Public Key Certification
● Dataset Suggestions
● Conclusion
7. Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August).
Combating web spam with TrustRank.
In Proceedings of the Thirtieth international conference on Very large data
bases-Volume 30 (pp. 576-587). VLDB Endowment.
8. TrustRank overview
Gyongyi et al proposed a couple of techniques to semi-automatically separate
reputable web pages from spam. Their process is to first select a small set of
seed pages to be human-evaluated. Once they detect manually the reputable seed
pages, they exploit the nature of web, namely its link structure in order to discover
more that are likely to be good, as well. The benchmark dataset consists of
AltaVista’s web index as of 2003.
9. Contribution
1. Formalization of web spam problem and detection algorithms.
2. Metrics defined for assessing the efficacy of detection algorithms.
3. Schemes for selecting seed sets of pages to be manually evaluated.
4. Introduction of TrustRank algorithm for determining the likelihood that pages
are reputable.
5. An extensive evaluation, based on 31 million sites crawled by the AltaVista
search engine, and a manual examination of over 2,000 sites.
10. Assessing trust
The creators of good pages can
sometimes be “tricked,” so we do
find some good-to-bad links on the
web.
11. Assessing trust
Oracle function
O(p) = 0 if p is bad,1 if p is good.
Trust function
T(p) = Pr[O(p) = 1]
Ordered Trust Property
T(p) < T(q) ⇔ Pr[O(p) = 1] < Pr[O(q) = 1]
T(p) = T(q) ⇔ Pr[O(p) = 1] = Pr[O(q) = 1]
Threshold value δ
T(p) > δ ⇔ O(p) = 1
15. Trust Attenuation
The further away we are from good seed pages,
the less certain we are that a page is good. For
instance, in Figure 2 there are 2 pages (namely,
pages 2 and 4) that are at most 2 links away
from the good seed pages. As both of them are
good, the probability that we reach a good page
in at most 2 steps is 1. Similarly, the number of
pages reachable from the good seed in at most
3 steps is 3. These observations suggest that
we reduce trust as we move further and further
away from the good seed pages.
16. Trust Attenuation
Trust dampening. Since page 2 is one link away from
the good seed page 1, we assign it a dampened trust
score of β, where β < 1. Since page 3 is reachable in
one step from page 2 with score β, it gets a dampened
score of β · β.
Trust splitting. If a good page has only a handful of
outlinks, then it is likely that the pointed pages are also
good. However, if a good page has hundreds of
outlinks, it is more probable that some of them will
point to bad pages.
18. s = 0.08, 0.13, 0.08, 0.10, 0.09, 0.06, 0.02
σ = 2, 4, 5, 1, 3, 6, 7
(L = 3, seed set is {2,4,5})
d = 0, ½ , 0, 1 2 , 0, 0, 0
(aβ = 0.85 and MB = 20)
t∗ = 0, 0.18, 0.12, 0.15, 0.13, 0.05, 0.05
19. TrustRank
TrustRank usually gives good pages a higher score. In particular, three
of the four good pages (namely, pages 2, 3, and 4) got high scores and
two of the three bad pages (pages 6 and 7) got low scores. However,
the algorithm failed to assign pages 1 and 5 adequate scores. Page 1
was not among the seeds, and it did not have any inlinks through which
to accumulate score, so its score remained at 0. All good unreferenced
web pages receive a similar treatment, unless they are selected as
seeds. Bad page 5 received a high score because it is the direct target
of one of the rare good-to-bad links.
20. Experiments
To evaluate the algorithms, authors performed experiments using the
complete set of pages crawled and indexed by the AltaVista search
engine as of August 2003. In order to reduce computational demands,
they worked with web sites instead of individual pages. They grouped
the several billion pages into 31,003,946 sites, using a proprietary
algorithm that is part of the AltaVista engine. More than one third of the
sites (13,197,046) were unreferenced. The first author of this paper
played the role of the oracle, examining pages of various sites,
determining if they are spam, and performing additional classification.
The manual evaluations took weeks.
21. Evaluation
1000 sites, not at random.
With a random sample, a great number
of the sites would be very small (with
few pages) and/or have very low
PageRank. It is more important to
correctly detect spam in high
PageRank sites, since they will more
often appear high in query result sets.
22. Evaluation Virtually no spam in the top 5
TrustRank buckets, while there is
a marked increase in spam
concentration in the lower
buckets.
At the same time, it is surprising
that almost 20% of the second
PageRank bucket is bad.
23. Precision & Recall TrustRank assigned the highest
scores to good sites, and the
proportion of bad increases
gradually as we move to lower
scores. Hence, precision and
recall manifest an almost linear
decrease and increase,
respectively.
24. Conclusion
Experimental results show that we can effectively
identify a significant number of strongly reputable
(non-spam) pages. In a search engine, TrustRank can be
used either separately to filter the index, or in
combination with PageRank and other metrics to rank
search results.
25. Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May).
Propagation of Trust and Distrust.
In Proceedings of the 13th international conference on World Wide Web (pp.
403-412).
26. Propagation of Trust and Distrust
Guha et al set a formal framework of propagation schemes, using both
trust and distrust, in order to measure the “belief” of a user on any other
user.
● Why Distrust?
Distrust is as important as trust, regarding the opinion of a user for another
user, if not more. Their results show that using distrust in retailing /
recommendation networks is of significant use and improves the accuracy of
the predictions.
27. Distrust Challenges
Challenge 1
How to model
“Does a trust score of 0
translate to distrust or to
‘no opinion’?”[2]
Challenge 2
Chain Distrust
How can one apply
distrust on a user chain?
What if there is a chain of
distrust?
Challenge 3
Algorithmic Challenges
The main eigenvector of
a trust matrix including
distrust doesn’t have to
be real, but that raises
algorithmic issues
(Matrix to Markov chain).
29. Methodology
T & D
B Matrix
Propagation Process
CB,α
CB,α
.
.
.
CB,α
ktimes
P<k>
F Matrix
Rounding
> Global
> Local
> Majority
> Trust only
> One-Step
Distrust
> Propagated
Distrust
> EIG: F = P<k>
> WLC: add
constant γ
γ=0.5 / γ=0.9> Direct-only: a = e1
> Co-citation: a = e2
> Combined (all 4):
a = (0.4, 0.4, 0.1, 0.1)
30. Data
● Directed Graph from Epinions
● 131.829 nodes & 841.372 edges
Edge Labels: Trust or Distrust
(85% Trust Edges)
● Distribution: Power Law
● Structure: Symmetric Bow-tie
SCC of 41.500 nodes
40.000 in SCC / 30.000 out of SCC
● Giant WCC ~120.000 nodes
81 different schemes
Best Combination:
● k=20
● a=e* (combination)
● Majority Rounding
● EIG
● One-step distrust
Error (incorrect predictions):
e = 0.064 & es
= 0.147
Results
31. OVERVIEW
● Introduction
● Task Definition
● Schema
● Combating web spam with TrustRank.
● Propagation of Trust and Distrust
● The EigenTrust algorithm for reputation management in p2p networks
● Attack-Resistant Trust Metrics for Public Key Certification
● Dataset Suggestions
● Conclusion
32. Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May).
The EigenTrust algorithm for reputation management in p2p networks.
In Proceedings of the 12th international conference on World Wide Web (pp.
640-651). ACM.
33. EigenTrust Overview
1. Reputation system on a P2P Network
2. Trust level
3. Robust against malicious peers and Freeriders
4. Reward good behavior through several transactions
34. Trust level
A Peer will Trust:
1. Peers who have provided him authentic files.
2. Their opinions about other files.
3. Known trustworthy Peers.
35. Estimating Trust
● Terminology
○ Local trust value cij
■ The opinion peer i has of peer j, based on past experience
■ Each time peer i downloads an authentic/inauthentic file from peer j, cij increases or
decreases
○ Global trust value ti
■ The trust that the entire system places in peer i
36. Estimating Trust Level
● Normalization of cij
otherwise,malicious peers can assign arbitrarily high local
trust value to other malicious peers
● Local Trust Vector: ci
contains all local trust values cij
that peer i has of other
peers j
● Iterative friend-friend reference:
○ Ask your friend t = CT
*ci
○ Ask their friend t = (CT
)2
*ci
○ Ask until all nodes t = (CT
)n
*ci
○ Ask until all nodes: For N large, ti
converge to same vector for every peer i
37. Practical issues and solutions
● A priori notions of trust
○ Define some distribution p over pre-trusted peers
● Inactive Peers
○ If a peer i does not download from anybody else, or if he assigns a zero score to all other
peers, their trust value will be redefined as they will choose to trust pre trusted users
● Malicious Collectives
○ This is addressed by having each peer place at least some trust in the peers that are not part
of a collective
38. Distributed Eigentrust
● Each peer stores his local trust vector ci
● Each peer stores and computes his own global trust value ti
● With the addition of p distribution
39. Secure Eigentrust
● A peer should not hold his own t
○ Problem: malicious Peer can report false value
○ Solution: A different peer computes t for this peer
● t should not be computed by only one peer
○ Problem: malicious Peer can report false value for another peer
○ Solution: multiple score managers
40. Experiments
The performance of this scheme is assessed based on simulations of a P2P
network.The number of peers is usually 100 and they are connected by a
power-law model.There are different threat models, that are executed on this
network.
43. Levien, R., & Aiken, A. (1998, January).
Attack-Resistant Trust Metrics for Public Key Certification.
In Usenix Security.
44. Trust Metrics on Network Certificates
Certificate Applications:
● Authentication
● Data Integrity
● Encryption
45. Trust Metrics on Network Certificates
Using the digitally signed certificates a directed graph is formed which will
be the model for deploying and test a number of trust metrics measuring
the attack resistance of a given certificate network.
Two Types of certificates:
● Binding Certificates, “I believe that subject key k is the key belonging
to name n”
● Delegation certificates “I trust certificates signed
by subject key k”
46. Trust Metrics on Network Certificates
A good trust metric ensures that there are really multiple
independent sources of certification, and rejects assertions with
insufficient certification.
No trust metric can protect against attacks on d keys or more,
where d is the minimum number of certifiers on any widely
accepted key.
48. Trust Metrics on Network Certificates
Attack Types:
● Node attack: the attacker is able to generate any certificate
from the attacked key. (stolen password)
● Edge attack: the attacker is only able to generate a delegation
certificate from the attacked key. (convince key owner)
49. Trust Metrics on Network Certificates
Maximum Network Flow Metric
Each node n in the graph is assigned a capacity
C(s,t)(n) = max(fs(dist(s, n)), gt(dist(n, t)))
s = source, t = target, dist(n,t) = shortest path, d = degree
50. Trust Metrics on Network Certificates
Results
Maximum Network Flow Metric is as effective as previously
suggested approaches for node attacks but is far more
resistant to edge attacks.
51. OVERVIEW
● Introduction
● Task Definition
● Schema
● Combating web spam with TrustRank.
● Propagation of Trust and Distrust
● The EigenTrust algorithm for reputation management in p2p networks
● Attack-Resistant Trust Metrics for Public Key Certification
● Dataset Suggestions
● Conclusion
52. Dataset Table*
Paper Existing Dataset Suggested Dataset Reason
TrustRank AltaVista Google
Better representation of the
web by Google, as it is used by
more users.
Trust & Distrust Epinions Amazon reviews
Evaluation on large network; low
number of votes and people can
be count as distrust.
EigenTrust in P2P Simulation
Gnutella Peer to Peer
Network
Evaluate the consistency of the
system on a large network.
Digital Certificates
PGP key database
(certificate graph)
Ego-Facebook /
email-EuAll/
email-Enron
Evaluate a community for
resistant on circulating malicious
information and on inflirtating.
* All suggested datasets can be found in SNAP [5]
53. Conclusion
● Trust is an important aspect that should not be missing from the social web;
● We can successfully separate reputable pages from spam in a search engine
using TrustRank;
● Distrust is a significant value that should not be ignored as it can promote the
importance of trust and improve the performance of an approach;
● Malicious peers can be identified and isolated using the uploads of a user
with the EigenTrust algorithm;
● We can achieve the evaluation of the attack resistance of a network using the
Maximum Network Flow metric.
54. References
1. Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August). Combating web spam with TrustRank.
In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp.
576-587). VLDB Endowment.
2. Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May). Propagation of Trust and Distrust. In
Proceedings of the 13th international conference on World Wide Web (pp. 403-412).
3. Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May). The EigenTrust algorithm for
reputation management in p2p networks. In Proceedings of the 12th international conference on
World Wide Web (pp. 640-651). ACM.
4. Levien, R., & Aiken, A. (1998, January). Attack-Resistant Trust Metrics for Public Key Certification. In
Usenix Security.
5. Stanford Network Analysis Project: http://snap.stanford.edu/