SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
Transitivity of Trust
Team
Founta Antigoni-Maria, UID: 647
Kouslis Ilias, UID: 650
Moutidis Iraklis, UID: 636
Spathis Dimitris, UID: 640
OVERVIEW
● Introduction
● Task Definition
● Schema
● Combating web spam with TrustRank.
● Propagation of Trust and Distrust
● The EigenTrust algorithm for reputation management in p2p networks
● Attack-Resistant Trust Metrics for Public Key Certification
● Dataset Suggestions
● Conclusion
Introduction
● Given the open nature of social networks and their current level of popularity,
users are increasingly concerned about privacy and security;
● We need to trust the entities that belong to our social network;
● To achieve that, a “Web of Trust” should be introduced;
● In order to balance the open nature of social networks and safeguard the
privacy concerns of users, it is important to build “Trust Communities”.
Task Definition [2]
Challenges
Users sometimes adopt
many personas and
express a large number
of biased opinions.
Difficulty in defining trust.
Importance
On e-commerce, a trust
model can increase the
value of a product.
Trusted users will have
greater influence and
perks; that can lead to
positive effect on user
behaviour.
Better recommendations.
Applications
Internet Networks:
● Social networks
● P2P networks
● Certificate networks
● Mail networks
Schema
Web of Trust
Case Studies
EigenTrust on P2P Digital Certificates
Methodologies
TrustRank Trust & Distrust
Case Studies
OVERVIEW
● Introduction
● Task Definition
● Schema
● Combating web spam with TrustRank
● Propagation of Trust and Distrust
● The EigenTrust algorithm for reputation management in p2p networks
● Attack-Resistant Trust Metrics for Public Key Certification
● Dataset Suggestions
● Conclusion
Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August).
Combating web spam with TrustRank.
In Proceedings of the Thirtieth international conference on Very large data
bases-Volume 30 (pp. 576-587). VLDB Endowment.
TrustRank overview
Gyongyi et al proposed a couple of techniques to semi-automatically separate
reputable web pages from spam. Their process is to first select a small set of
seed pages to be human-evaluated. Once they detect manually the reputable seed
pages, they exploit the nature of web, namely its link structure in order to discover
more that are likely to be good, as well. The benchmark dataset consists of
AltaVista’s web index as of 2003.
Contribution
1. Formalization of web spam problem and detection algorithms.
2. Metrics defined for assessing the efficacy of detection algorithms.
3. Schemes for selecting seed sets of pages to be manually evaluated.
4. Introduction of TrustRank algorithm for determining the likelihood that pages
are reputable.
5. An extensive evaluation, based on 31 million sites crawled by the AltaVista
search engine, and a manual examination of over 2,000 sites.
Assessing trust
The creators of good pages can
sometimes be “tricked,” so we do
find some good-to-bad links on the
web.
Assessing trust
Oracle function
O(p) = 0 if p is bad,1 if p is good.
Trust function
T(p) = Pr[O(p) = 1]
Ordered Trust Property
T(p) < T(q) ⇔ Pr[O(p) = 1] < Pr[O(q) = 1]
T(p) = T(q) ⇔ Pr[O(p) = 1] = Pr[O(q) = 1]
Threshold value δ
T(p) > δ ⇔ O(p) = 1
Evaluation
Pairwise Orderedness Precision
Recall
Computing trust
Ignorant trust function
T(p) = O(p) if p ∈ S, 1/2 otherwise
A randomly selected seed set
S = {1,3,6}
Oracle vector
o = [1, 1, 1, 1, 0, 0, 0]
Trust vector
t = [1, 1/2 , 1, 1/2 , 1/2 , 0, 1/2 ].
7·6 = 42 ordered pairs
Computing trust
Pairwise orderedness
T = 17/21
Threshold
δ = ½
Precision
1
Recall
½
Trust Attenuation
The further away we are from good seed pages,
the less certain we are that a page is good. For
instance, in Figure 2 there are 2 pages (namely,
pages 2 and 4) that are at most 2 links away
from the good seed pages. As both of them are
good, the probability that we reach a good page
in at most 2 steps is 1. Similarly, the number of
pages reachable from the good seed in at most
3 steps is 3. These observations suggest that
we reduce trust as we move further and further
away from the good seed pages.
Trust Attenuation
Trust dampening. Since page 2 is one link away from
the good seed page 1, we assign it a dampened trust
score of β, where β < 1. Since page 3 is reachable in
one step from page 2 with score β, it gets a dampened
score of β · β.
Trust splitting. If a good page has only a handful of
outlinks, then it is likely that the pointed pages are also
good. However, if a good page has hundreds of
outlinks, it is more probable that some of them will
point to bad pages.
TrustRank
selectSeed: inverse
Pagerank in order to
choose the best seeds
s = 0.08, 0.13, 0.08, 0.10, 0.09, 0.06, 0.02
σ = 2, 4, 5, 1, 3, 6, 7
(L = 3, seed set is {2,4,5})
d = 0, ½ , 0, 1 2 , 0, 0, 0
(aβ = 0.85 and MB = 20)
t∗ = 0, 0.18, 0.12, 0.15, 0.13, 0.05, 0.05
TrustRank
TrustRank usually gives good pages a higher score. In particular, three
of the four good pages (namely, pages 2, 3, and 4) got high scores and
two of the three bad pages (pages 6 and 7) got low scores. However,
the algorithm failed to assign pages 1 and 5 adequate scores. Page 1
was not among the seeds, and it did not have any inlinks through which
to accumulate score, so its score remained at 0. All good unreferenced
web pages receive a similar treatment, unless they are selected as
seeds. Bad page 5 received a high score because it is the direct target
of one of the rare good-to-bad links.
Experiments
To evaluate the algorithms, authors performed experiments using the
complete set of pages crawled and indexed by the AltaVista search
engine as of August 2003. In order to reduce computational demands,
they worked with web sites instead of individual pages. They grouped
the several billion pages into 31,003,946 sites, using a proprietary
algorithm that is part of the AltaVista engine. More than one third of the
sites (13,197,046) were unreferenced. The first author of this paper
played the role of the oracle, examining pages of various sites,
determining if they are spam, and performing additional classification.
The manual evaluations took weeks.
Evaluation
1000 sites, not at random.
With a random sample, a great number
of the sites would be very small (with
few pages) and/or have very low
PageRank. It is more important to
correctly detect spam in high
PageRank sites, since they will more
often appear high in query result sets.
Evaluation Virtually no spam in the top 5
TrustRank buckets, while there is
a marked increase in spam
concentration in the lower
buckets.
At the same time, it is surprising
that almost 20% of the second
PageRank bucket is bad.
Precision & Recall TrustRank assigned the highest
scores to good sites, and the
proportion of bad increases
gradually as we move to lower
scores. Hence, precision and
recall manifest an almost linear
decrease and increase,
respectively.
Conclusion
Experimental results show that we can effectively
identify a significant number of strongly reputable
(non-spam) pages. In a search engine, TrustRank can be
used either separately to filter the index, or in
combination with PageRank and other metrics to rank
search results.
Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May).
Propagation of Trust and Distrust.
In Proceedings of the 13th international conference on World Wide Web (pp.
403-412).
Propagation of Trust and Distrust
Guha et al set a formal framework of propagation schemes, using both
trust and distrust, in order to measure the “belief” of a user on any other
user.
● Why Distrust?
Distrust is as important as trust, regarding the opinion of a user for another
user, if not more. Their results show that using distrust in retailing /
recommendation networks is of significant use and improves the accuracy of
the predictions.
Distrust Challenges
Challenge 1
How to model
“Does a trust score of 0
translate to distrust or to
‘no opinion’?”[2]
Challenge 2
Chain Distrust
How can one apply
distrust on a user chain?
What if there is a chain of
distrust?
Challenge 3
Algorithmic Challenges
The main eigenvector of
a trust matrix including
distrust doesn’t have to
be real, but that raises
algorithmic issues
(Matrix to Markov chain).
Fundamentals
Atomic Propagation:
● Direct Propagation
● Co-citation
● Transpose trust
● Trust coupling
Methodology
T & D
B Matrix
Propagation Process
CB,α
CB,α
.
.
.
CB,α
ktimes
P<k>
F Matrix
Rounding
> Global
> Local
> Majority
> Trust only
> One-Step
Distrust
> Propagated
Distrust
> EIG: F = P<k>
> WLC: add
constant γ
γ=0.5 / γ=0.9> Direct-only: a = e1
> Co-citation: a = e2
> Combined (all 4):
a = (0.4, 0.4, 0.1, 0.1)
Data
● Directed Graph from Epinions
● 131.829 nodes & 841.372 edges
Edge Labels: Trust or Distrust
(85% Trust Edges)
● Distribution: Power Law
● Structure: Symmetric Bow-tie
SCC of 41.500 nodes
40.000 in SCC / 30.000 out of SCC
● Giant WCC ~120.000 nodes
81 different schemes
Best Combination:
● k=20
● a=e* (combination)
● Majority Rounding
● EIG
● One-step distrust
Error (incorrect predictions):
e = 0.064 & es
= 0.147
Results
OVERVIEW
● Introduction
● Task Definition
● Schema
● Combating web spam with TrustRank.
● Propagation of Trust and Distrust
● The EigenTrust algorithm for reputation management in p2p networks
● Attack-Resistant Trust Metrics for Public Key Certification
● Dataset Suggestions
● Conclusion
Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May).
The EigenTrust algorithm for reputation management in p2p networks.
In Proceedings of the 12th international conference on World Wide Web (pp.
640-651). ACM.
EigenTrust Overview
1. Reputation system on a P2P Network
2. Trust level
3. Robust against malicious peers and Freeriders
4. Reward good behavior through several transactions
Trust level
A Peer will Trust:
1. Peers who have provided him authentic files.
2. Their opinions about other files.
3. Known trustworthy Peers.
Estimating Trust
● Terminology
○ Local trust value cij
■ The opinion peer i has of peer j, based on past experience
■ Each time peer i downloads an authentic/inauthentic file from peer j, cij increases or
decreases
○ Global trust value ti
■ The trust that the entire system places in peer i
Estimating Trust Level
● Normalization of cij
otherwise,malicious peers can assign arbitrarily high local
trust value to other malicious peers
● Local Trust Vector: ci
contains all local trust values cij
that peer i has of other
peers j
● Iterative friend-friend reference:
○ Ask your friend t = CT
*ci
○ Ask their friend t = (CT
)2
*ci
○ Ask until all nodes t = (CT
)n
*ci
○ Ask until all nodes: For N large, ti
converge to same vector for every peer i
Practical issues and solutions
● A priori notions of trust
○ Define some distribution p over pre-trusted peers
● Inactive Peers
○ If a peer i does not download from anybody else, or if he assigns a zero score to all other
peers, their trust value will be redefined as they will choose to trust pre trusted users
● Malicious Collectives
○ This is addressed by having each peer place at least some trust in the peers that are not part
of a collective
Distributed Eigentrust
● Each peer stores his local trust vector ci
● Each peer stores and computes his own global trust value ti
● With the addition of p distribution
Secure Eigentrust
● A peer should not hold his own t
○ Problem: malicious Peer can report false value
○ Solution: A different peer computes t for this peer
● t should not be computed by only one peer
○ Problem: malicious Peer can report false value for another peer
○ Solution: multiple score managers
Experiments
The performance of this scheme is assessed based on simulations of a P2P
network.The number of peers is usually 100 and they are connected by a
power-law model.There are different threat models, that are executed on this
network.
Malicious Peers
Malicious Collectives
Levien, R., & Aiken, A. (1998, January).
Attack-Resistant Trust Metrics for Public Key Certification.
In Usenix Security.
Trust Metrics on Network Certificates
Certificate Applications:
● Authentication
● Data Integrity
● Encryption
Trust Metrics on Network Certificates
Using the digitally signed certificates a directed graph is formed which will
be the model for deploying and test a number of trust metrics measuring
the attack resistance of a given certificate network.
Two Types of certificates:
● Binding Certificates, “I believe that subject key k is the key belonging
to name n”
● Delegation certificates “I trust certificates signed
by subject key k”
Trust Metrics on Network Certificates
A good trust metric ensures that there are really multiple
independent sources of certification, and rejects assertions with
insufficient certification.
No trust metric can protect against attacks on d keys or more,
where d is the minimum number of certifiers on any widely
accepted key.
Transitivity of Trust
Trust Metrics on Network Certificates
Attack Types:
● Node attack: the attacker is able to generate any certificate
from the attacked key. (stolen password)
● Edge attack: the attacker is only able to generate a delegation
certificate from the attacked key. (convince key owner)
Trust Metrics on Network Certificates
Maximum Network Flow Metric
Each node n in the graph is assigned a capacity
C(s,t)(n) = max(fs(dist(s, n)), gt(dist(n, t)))
s = source, t = target, dist(n,t) = shortest path, d = degree
Trust Metrics on Network Certificates
Results
Maximum Network Flow Metric is as effective as previously
suggested approaches for node attacks but is far more
resistant to edge attacks.
OVERVIEW
● Introduction
● Task Definition
● Schema
● Combating web spam with TrustRank.
● Propagation of Trust and Distrust
● The EigenTrust algorithm for reputation management in p2p networks
● Attack-Resistant Trust Metrics for Public Key Certification
● Dataset Suggestions
● Conclusion
Dataset Table*
Paper Existing Dataset Suggested Dataset Reason
TrustRank AltaVista Google
Better representation of the
web by Google, as it is used by
more users.
Trust & Distrust Epinions Amazon reviews
Evaluation on large network; low
number of votes and people can
be count as distrust.
EigenTrust in P2P Simulation
Gnutella Peer to Peer
Network
Evaluate the consistency of the
system on a large network.
Digital Certificates
PGP key database
(certificate graph)
Ego-Facebook /
email-EuAll/
email-Enron
Evaluate a community for
resistant on circulating malicious
information and on inflirtating.
* All suggested datasets can be found in SNAP [5]
Conclusion
● Trust is an important aspect that should not be missing from the social web;
● We can successfully separate reputable pages from spam in a search engine
using TrustRank;
● Distrust is a significant value that should not be ignored as it can promote the
importance of trust and improve the performance of an approach;
● Malicious peers can be identified and isolated using the uploads of a user
with the EigenTrust algorithm;
● We can achieve the evaluation of the attack resistance of a network using the
Maximum Network Flow metric.
References
1. Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August). Combating web spam with TrustRank.
In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp.
576-587). VLDB Endowment.
2. Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May). Propagation of Trust and Distrust. In
Proceedings of the 13th international conference on World Wide Web (pp. 403-412).
3. Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May). The EigenTrust algorithm for
reputation management in p2p networks. In Proceedings of the 12th international conference on
World Wide Web (pp. 640-651). ACM.
4. Levien, R., & Aiken, A. (1998, January). Attack-Resistant Trust Metrics for Public Key Certification. In
Usenix Security.
5. Stanford Network Analysis Project: http://snap.stanford.edu/
Any questions?
Thank you!

Contenu connexe

En vedette

6 Characteristics of High Trust Teams - October 2015
6 Characteristics of High Trust Teams - October 20156 Characteristics of High Trust Teams - October 2015
6 Characteristics of High Trust Teams - October 2015Mike Sharrow
 
Periscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval EnginePeriscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval EngineAntigoni-Maria Founta
 
The 5 dysfunctions of a team: a PowerPoint presentation of Lencioni's book
The 5 dysfunctions of a team: a PowerPoint presentation of Lencioni's bookThe 5 dysfunctions of a team: a PowerPoint presentation of Lencioni's book
The 5 dysfunctions of a team: a PowerPoint presentation of Lencioni's bookSusan Tait, CSM
 
Building Better Teams - Overcoming the 5 Dysfunctions
Building Better Teams - Overcoming the 5 DysfunctionsBuilding Better Teams - Overcoming the 5 Dysfunctions
Building Better Teams - Overcoming the 5 DysfunctionsJoel Wenger
 
The 5 dysfunctions of a team Management Presentation
The 5 dysfunctions of a team Management PresentationThe 5 dysfunctions of a team Management Presentation
The 5 dysfunctions of a team Management Presentationrajopadhye
 
The Five Dysfunctions of a Team
The Five Dysfunctions of a TeamThe Five Dysfunctions of a Team
The Five Dysfunctions of a TeamGreg
 

En vedette (12)

Team trust ohsu 23 mar-2011
Team trust ohsu 23 mar-2011Team trust ohsu 23 mar-2011
Team trust ohsu 23 mar-2011
 
Opinion mining
Opinion miningOpinion mining
Opinion mining
 
6 Characteristics of High Trust Teams - October 2015
6 Characteristics of High Trust Teams - October 20156 Characteristics of High Trust Teams - October 2015
6 Characteristics of High Trust Teams - October 2015
 
Periscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval EnginePeriscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval Engine
 
The 5 dysfunctions of a team: a PowerPoint presentation of Lencioni's book
The 5 dysfunctions of a team: a PowerPoint presentation of Lencioni's bookThe 5 dysfunctions of a team: a PowerPoint presentation of Lencioni's book
The 5 dysfunctions of a team: a PowerPoint presentation of Lencioni's book
 
Patrick Lencioni’s Five Team Dysfunctions
Patrick Lencioni’s Five Team DysfunctionsPatrick Lencioni’s Five Team Dysfunctions
Patrick Lencioni’s Five Team Dysfunctions
 
Building Better Teams - Overcoming the 5 Dysfunctions
Building Better Teams - Overcoming the 5 DysfunctionsBuilding Better Teams - Overcoming the 5 Dysfunctions
Building Better Teams - Overcoming the 5 Dysfunctions
 
The 5 dysfunctions of a team Management Presentation
The 5 dysfunctions of a team Management PresentationThe 5 dysfunctions of a team Management Presentation
The 5 dysfunctions of a team Management Presentation
 
Semantic Linked Data
Semantic Linked DataSemantic Linked Data
Semantic Linked Data
 
Linked data and Graph properties
Linked data and Graph propertiesLinked data and Graph properties
Linked data and Graph properties
 
Incremental clustering in search engines
Incremental clustering in search enginesIncremental clustering in search engines
Incremental clustering in search engines
 
The Five Dysfunctions of a Team
The Five Dysfunctions of a TeamThe Five Dysfunctions of a Team
The Five Dysfunctions of a Team
 

Similaire à Transitivity of Trust

A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESSubhajit Sahu
 
Done rerea dlink spam alliances good
Done rerea dlink spam alliances goodDone rerea dlink spam alliances good
Done rerea dlink spam alliances goodJames Arnold
 
Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using...
Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using...Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using...
Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using...M. Atif Qureshi
 
Trust and Reputation for inferring quality of resources
Trust and Reputation for inferring quality of resourcesTrust and Reputation for inferring quality of resources
Trust and Reputation for inferring quality of resourcesPaolo Massa
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfrayyverma
 
Knowledge Based Trust: Estimating the Trustworthiness of Web Sources
Knowledge Based Trust: Estimating the Trustworthiness of Web SourcesKnowledge Based Trust: Estimating the Trustworthiness of Web Sources
Knowledge Based Trust: Estimating the Trustworthiness of Web SourcesЮниВеб
 
Linkplanner case study
Linkplanner case study Linkplanner case study
Linkplanner case study Nick Garner
 
Algorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConAlgorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConmattthemathman
 
Trustlet, Open Research on Trust Metrics
Trustlet, Open Research on Trust MetricsTrustlet, Open Research on Trust Metrics
Trustlet, Open Research on Trust MetricsPaolo Massa
 
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...Mathieu d'Aquin
 
How to Protect your Site and Recover from Google Penguin Penalties
How to Protect your Site and Recover from Google Penguin PenaltiesHow to Protect your Site and Recover from Google Penguin Penalties
How to Protect your Site and Recover from Google Penguin PenaltiesMarcela De Vivo
 
CrowdsouRS: A Crowdsourced Reputation System for Identifying Deceptive Web-co...
CrowdsouRS: A Crowdsourced Reputation System for Identifying Deceptive Web-co...CrowdsouRS: A Crowdsourced Reputation System for Identifying Deceptive Web-co...
CrowdsouRS: A Crowdsourced Reputation System for Identifying Deceptive Web-co...MD. ABU TALHA
 
Google Penguin Penalty Backlink Audit
Google Penguin Penalty Backlink AuditGoogle Penguin Penalty Backlink Audit
Google Penguin Penalty Backlink AuditMarcela De Vivo
 
page ranking web crawling
page ranking web crawlingpage ranking web crawling
page ranking web crawlingpradiprahul
 

Similaire à Transitivity of Trust (20)

TrustRank.PDF
TrustRank.PDFTrustRank.PDF
TrustRank.PDF
 
I04015559
I04015559I04015559
I04015559
 
Page Rank Link Farm Detection
Page Rank Link Farm DetectionPage Rank Link Farm Detection
Page Rank Link Farm Detection
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
 
Done rerea dlink spam alliances good
Done rerea dlink spam alliances goodDone rerea dlink spam alliances good
Done rerea dlink spam alliances good
 
Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using...
Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using...Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using...
Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using...
 
Trust and Reputation for inferring quality of resources
Trust and Reputation for inferring quality of resourcesTrust and Reputation for inferring quality of resources
Trust and Reputation for inferring quality of resources
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdf
 
Macran
MacranMacran
Macran
 
50120140504017
5012014050401750120140504017
50120140504017
 
Knowledge Based Trust: Estimating the Trustworthiness of Web Sources
Knowledge Based Trust: Estimating the Trustworthiness of Web SourcesKnowledge Based Trust: Estimating the Trustworthiness of Web Sources
Knowledge Based Trust: Estimating the Trustworthiness of Web Sources
 
Linkplanner case study
Linkplanner case study Linkplanner case study
Linkplanner case study
 
Algorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConAlgorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozCon
 
Trustlet, Open Research on Trust Metrics
Trustlet, Open Research on Trust MetricsTrustlet, Open Research on Trust Metrics
Trustlet, Open Research on Trust Metrics
 
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
 
How to Protect your Site and Recover from Google Penguin Penalties
How to Protect your Site and Recover from Google Penguin PenaltiesHow to Protect your Site and Recover from Google Penguin Penalties
How to Protect your Site and Recover from Google Penguin Penalties
 
CrowdsouRS: A Crowdsourced Reputation System for Identifying Deceptive Web-co...
CrowdsouRS: A Crowdsourced Reputation System for Identifying Deceptive Web-co...CrowdsouRS: A Crowdsourced Reputation System for Identifying Deceptive Web-co...
CrowdsouRS: A Crowdsourced Reputation System for Identifying Deceptive Web-co...
 
Web mining
Web miningWeb mining
Web mining
 
Google Penguin Penalty Backlink Audit
Google Penguin Penalty Backlink AuditGoogle Penguin Penalty Backlink Audit
Google Penguin Penalty Backlink Audit
 
page ranking web crawling
page ranking web crawlingpage ranking web crawling
page ranking web crawling
 

Dernier

Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 

Dernier (17)

Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 

Transitivity of Trust

  • 1. Transitivity of Trust Team Founta Antigoni-Maria, UID: 647 Kouslis Ilias, UID: 650 Moutidis Iraklis, UID: 636 Spathis Dimitris, UID: 640
  • 2. OVERVIEW ● Introduction ● Task Definition ● Schema ● Combating web spam with TrustRank. ● Propagation of Trust and Distrust ● The EigenTrust algorithm for reputation management in p2p networks ● Attack-Resistant Trust Metrics for Public Key Certification ● Dataset Suggestions ● Conclusion
  • 3. Introduction ● Given the open nature of social networks and their current level of popularity, users are increasingly concerned about privacy and security; ● We need to trust the entities that belong to our social network; ● To achieve that, a “Web of Trust” should be introduced; ● In order to balance the open nature of social networks and safeguard the privacy concerns of users, it is important to build “Trust Communities”.
  • 4. Task Definition [2] Challenges Users sometimes adopt many personas and express a large number of biased opinions. Difficulty in defining trust. Importance On e-commerce, a trust model can increase the value of a product. Trusted users will have greater influence and perks; that can lead to positive effect on user behaviour. Better recommendations. Applications Internet Networks: ● Social networks ● P2P networks ● Certificate networks ● Mail networks
  • 5. Schema Web of Trust Case Studies EigenTrust on P2P Digital Certificates Methodologies TrustRank Trust & Distrust Case Studies
  • 6. OVERVIEW ● Introduction ● Task Definition ● Schema ● Combating web spam with TrustRank ● Propagation of Trust and Distrust ● The EigenTrust algorithm for reputation management in p2p networks ● Attack-Resistant Trust Metrics for Public Key Certification ● Dataset Suggestions ● Conclusion
  • 7. Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August). Combating web spam with TrustRank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 576-587). VLDB Endowment.
  • 8. TrustRank overview Gyongyi et al proposed a couple of techniques to semi-automatically separate reputable web pages from spam. Their process is to first select a small set of seed pages to be human-evaluated. Once they detect manually the reputable seed pages, they exploit the nature of web, namely its link structure in order to discover more that are likely to be good, as well. The benchmark dataset consists of AltaVista’s web index as of 2003.
  • 9. Contribution 1. Formalization of web spam problem and detection algorithms. 2. Metrics defined for assessing the efficacy of detection algorithms. 3. Schemes for selecting seed sets of pages to be manually evaluated. 4. Introduction of TrustRank algorithm for determining the likelihood that pages are reputable. 5. An extensive evaluation, based on 31 million sites crawled by the AltaVista search engine, and a manual examination of over 2,000 sites.
  • 10. Assessing trust The creators of good pages can sometimes be “tricked,” so we do find some good-to-bad links on the web.
  • 11. Assessing trust Oracle function O(p) = 0 if p is bad,1 if p is good. Trust function T(p) = Pr[O(p) = 1] Ordered Trust Property T(p) < T(q) ⇔ Pr[O(p) = 1] < Pr[O(q) = 1] T(p) = T(q) ⇔ Pr[O(p) = 1] = Pr[O(q) = 1] Threshold value δ T(p) > δ ⇔ O(p) = 1
  • 13. Computing trust Ignorant trust function T(p) = O(p) if p ∈ S, 1/2 otherwise A randomly selected seed set S = {1,3,6} Oracle vector o = [1, 1, 1, 1, 0, 0, 0] Trust vector t = [1, 1/2 , 1, 1/2 , 1/2 , 0, 1/2 ]. 7·6 = 42 ordered pairs
  • 14. Computing trust Pairwise orderedness T = 17/21 Threshold δ = ½ Precision 1 Recall ½
  • 15. Trust Attenuation The further away we are from good seed pages, the less certain we are that a page is good. For instance, in Figure 2 there are 2 pages (namely, pages 2 and 4) that are at most 2 links away from the good seed pages. As both of them are good, the probability that we reach a good page in at most 2 steps is 1. Similarly, the number of pages reachable from the good seed in at most 3 steps is 3. These observations suggest that we reduce trust as we move further and further away from the good seed pages.
  • 16. Trust Attenuation Trust dampening. Since page 2 is one link away from the good seed page 1, we assign it a dampened trust score of β, where β < 1. Since page 3 is reachable in one step from page 2 with score β, it gets a dampened score of β · β. Trust splitting. If a good page has only a handful of outlinks, then it is likely that the pointed pages are also good. However, if a good page has hundreds of outlinks, it is more probable that some of them will point to bad pages.
  • 17. TrustRank selectSeed: inverse Pagerank in order to choose the best seeds
  • 18. s = 0.08, 0.13, 0.08, 0.10, 0.09, 0.06, 0.02 σ = 2, 4, 5, 1, 3, 6, 7 (L = 3, seed set is {2,4,5}) d = 0, ½ , 0, 1 2 , 0, 0, 0 (aβ = 0.85 and MB = 20) t∗ = 0, 0.18, 0.12, 0.15, 0.13, 0.05, 0.05
  • 19. TrustRank TrustRank usually gives good pages a higher score. In particular, three of the four good pages (namely, pages 2, 3, and 4) got high scores and two of the three bad pages (pages 6 and 7) got low scores. However, the algorithm failed to assign pages 1 and 5 adequate scores. Page 1 was not among the seeds, and it did not have any inlinks through which to accumulate score, so its score remained at 0. All good unreferenced web pages receive a similar treatment, unless they are selected as seeds. Bad page 5 received a high score because it is the direct target of one of the rare good-to-bad links.
  • 20. Experiments To evaluate the algorithms, authors performed experiments using the complete set of pages crawled and indexed by the AltaVista search engine as of August 2003. In order to reduce computational demands, they worked with web sites instead of individual pages. They grouped the several billion pages into 31,003,946 sites, using a proprietary algorithm that is part of the AltaVista engine. More than one third of the sites (13,197,046) were unreferenced. The first author of this paper played the role of the oracle, examining pages of various sites, determining if they are spam, and performing additional classification. The manual evaluations took weeks.
  • 21. Evaluation 1000 sites, not at random. With a random sample, a great number of the sites would be very small (with few pages) and/or have very low PageRank. It is more important to correctly detect spam in high PageRank sites, since they will more often appear high in query result sets.
  • 22. Evaluation Virtually no spam in the top 5 TrustRank buckets, while there is a marked increase in spam concentration in the lower buckets. At the same time, it is surprising that almost 20% of the second PageRank bucket is bad.
  • 23. Precision & Recall TrustRank assigned the highest scores to good sites, and the proportion of bad increases gradually as we move to lower scores. Hence, precision and recall manifest an almost linear decrease and increase, respectively.
  • 24. Conclusion Experimental results show that we can effectively identify a significant number of strongly reputable (non-spam) pages. In a search engine, TrustRank can be used either separately to filter the index, or in combination with PageRank and other metrics to rank search results.
  • 25. Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May). Propagation of Trust and Distrust. In Proceedings of the 13th international conference on World Wide Web (pp. 403-412).
  • 26. Propagation of Trust and Distrust Guha et al set a formal framework of propagation schemes, using both trust and distrust, in order to measure the “belief” of a user on any other user. ● Why Distrust? Distrust is as important as trust, regarding the opinion of a user for another user, if not more. Their results show that using distrust in retailing / recommendation networks is of significant use and improves the accuracy of the predictions.
  • 27. Distrust Challenges Challenge 1 How to model “Does a trust score of 0 translate to distrust or to ‘no opinion’?”[2] Challenge 2 Chain Distrust How can one apply distrust on a user chain? What if there is a chain of distrust? Challenge 3 Algorithmic Challenges The main eigenvector of a trust matrix including distrust doesn’t have to be real, but that raises algorithmic issues (Matrix to Markov chain).
  • 28. Fundamentals Atomic Propagation: ● Direct Propagation ● Co-citation ● Transpose trust ● Trust coupling
  • 29. Methodology T & D B Matrix Propagation Process CB,α CB,α . . . CB,α ktimes P<k> F Matrix Rounding > Global > Local > Majority > Trust only > One-Step Distrust > Propagated Distrust > EIG: F = P<k> > WLC: add constant γ γ=0.5 / γ=0.9> Direct-only: a = e1 > Co-citation: a = e2 > Combined (all 4): a = (0.4, 0.4, 0.1, 0.1)
  • 30. Data ● Directed Graph from Epinions ● 131.829 nodes & 841.372 edges Edge Labels: Trust or Distrust (85% Trust Edges) ● Distribution: Power Law ● Structure: Symmetric Bow-tie SCC of 41.500 nodes 40.000 in SCC / 30.000 out of SCC ● Giant WCC ~120.000 nodes 81 different schemes Best Combination: ● k=20 ● a=e* (combination) ● Majority Rounding ● EIG ● One-step distrust Error (incorrect predictions): e = 0.064 & es = 0.147 Results
  • 31. OVERVIEW ● Introduction ● Task Definition ● Schema ● Combating web spam with TrustRank. ● Propagation of Trust and Distrust ● The EigenTrust algorithm for reputation management in p2p networks ● Attack-Resistant Trust Metrics for Public Key Certification ● Dataset Suggestions ● Conclusion
  • 32. Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May). The EigenTrust algorithm for reputation management in p2p networks. In Proceedings of the 12th international conference on World Wide Web (pp. 640-651). ACM.
  • 33. EigenTrust Overview 1. Reputation system on a P2P Network 2. Trust level 3. Robust against malicious peers and Freeriders 4. Reward good behavior through several transactions
  • 34. Trust level A Peer will Trust: 1. Peers who have provided him authentic files. 2. Their opinions about other files. 3. Known trustworthy Peers.
  • 35. Estimating Trust ● Terminology ○ Local trust value cij ■ The opinion peer i has of peer j, based on past experience ■ Each time peer i downloads an authentic/inauthentic file from peer j, cij increases or decreases ○ Global trust value ti ■ The trust that the entire system places in peer i
  • 36. Estimating Trust Level ● Normalization of cij otherwise,malicious peers can assign arbitrarily high local trust value to other malicious peers ● Local Trust Vector: ci contains all local trust values cij that peer i has of other peers j ● Iterative friend-friend reference: ○ Ask your friend t = CT *ci ○ Ask their friend t = (CT )2 *ci ○ Ask until all nodes t = (CT )n *ci ○ Ask until all nodes: For N large, ti converge to same vector for every peer i
  • 37. Practical issues and solutions ● A priori notions of trust ○ Define some distribution p over pre-trusted peers ● Inactive Peers ○ If a peer i does not download from anybody else, or if he assigns a zero score to all other peers, their trust value will be redefined as they will choose to trust pre trusted users ● Malicious Collectives ○ This is addressed by having each peer place at least some trust in the peers that are not part of a collective
  • 38. Distributed Eigentrust ● Each peer stores his local trust vector ci ● Each peer stores and computes his own global trust value ti ● With the addition of p distribution
  • 39. Secure Eigentrust ● A peer should not hold his own t ○ Problem: malicious Peer can report false value ○ Solution: A different peer computes t for this peer ● t should not be computed by only one peer ○ Problem: malicious Peer can report false value for another peer ○ Solution: multiple score managers
  • 40. Experiments The performance of this scheme is assessed based on simulations of a P2P network.The number of peers is usually 100 and they are connected by a power-law model.There are different threat models, that are executed on this network.
  • 43. Levien, R., & Aiken, A. (1998, January). Attack-Resistant Trust Metrics for Public Key Certification. In Usenix Security.
  • 44. Trust Metrics on Network Certificates Certificate Applications: ● Authentication ● Data Integrity ● Encryption
  • 45. Trust Metrics on Network Certificates Using the digitally signed certificates a directed graph is formed which will be the model for deploying and test a number of trust metrics measuring the attack resistance of a given certificate network. Two Types of certificates: ● Binding Certificates, “I believe that subject key k is the key belonging to name n” ● Delegation certificates “I trust certificates signed by subject key k”
  • 46. Trust Metrics on Network Certificates A good trust metric ensures that there are really multiple independent sources of certification, and rejects assertions with insufficient certification. No trust metric can protect against attacks on d keys or more, where d is the minimum number of certifiers on any widely accepted key.
  • 48. Trust Metrics on Network Certificates Attack Types: ● Node attack: the attacker is able to generate any certificate from the attacked key. (stolen password) ● Edge attack: the attacker is only able to generate a delegation certificate from the attacked key. (convince key owner)
  • 49. Trust Metrics on Network Certificates Maximum Network Flow Metric Each node n in the graph is assigned a capacity C(s,t)(n) = max(fs(dist(s, n)), gt(dist(n, t))) s = source, t = target, dist(n,t) = shortest path, d = degree
  • 50. Trust Metrics on Network Certificates Results Maximum Network Flow Metric is as effective as previously suggested approaches for node attacks but is far more resistant to edge attacks.
  • 51. OVERVIEW ● Introduction ● Task Definition ● Schema ● Combating web spam with TrustRank. ● Propagation of Trust and Distrust ● The EigenTrust algorithm for reputation management in p2p networks ● Attack-Resistant Trust Metrics for Public Key Certification ● Dataset Suggestions ● Conclusion
  • 52. Dataset Table* Paper Existing Dataset Suggested Dataset Reason TrustRank AltaVista Google Better representation of the web by Google, as it is used by more users. Trust & Distrust Epinions Amazon reviews Evaluation on large network; low number of votes and people can be count as distrust. EigenTrust in P2P Simulation Gnutella Peer to Peer Network Evaluate the consistency of the system on a large network. Digital Certificates PGP key database (certificate graph) Ego-Facebook / email-EuAll/ email-Enron Evaluate a community for resistant on circulating malicious information and on inflirtating. * All suggested datasets can be found in SNAP [5]
  • 53. Conclusion ● Trust is an important aspect that should not be missing from the social web; ● We can successfully separate reputable pages from spam in a search engine using TrustRank; ● Distrust is a significant value that should not be ignored as it can promote the importance of trust and improve the performance of an approach; ● Malicious peers can be identified and isolated using the uploads of a user with the EigenTrust algorithm; ● We can achieve the evaluation of the attack resistance of a network using the Maximum Network Flow metric.
  • 54. References 1. Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August). Combating web spam with TrustRank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 576-587). VLDB Endowment. 2. Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May). Propagation of Trust and Distrust. In Proceedings of the 13th international conference on World Wide Web (pp. 403-412). 3. Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May). The EigenTrust algorithm for reputation management in p2p networks. In Proceedings of the 12th international conference on World Wide Web (pp. 640-651). ACM. 4. Levien, R., & Aiken, A. (1998, January). Attack-Resistant Trust Metrics for Public Key Certification. In Usenix Security. 5. Stanford Network Analysis Project: http://snap.stanford.edu/