1. Deutschen Akademischen 8th IEEE International Conference on
Austauschdienstes
Collaborative Computing:
Networking, Applications and Worksharing
October 14–17, 2012 Pittsburgh, Pennsylvania, United States
Robust Expert Ranking in Online
Communities - Fighting Sybil Attacks
CollaborateCom2012
Khaled Rashed
Cristina Balasoiu
Ralf Klamma Khaled A. N. Rashed, Cristina Balasoiu, Ralf Klamma
RWTH Aachen University
Advanced Community Information Systems (ACIS)
{rashed|balsoiu|klamma}@dbis.rwth-aachen.de
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-1
2. Advanced Community Information
Deutschen Akademischen
Austauschdienstes
Systems (ACIS)
CollaborateCom2012
Responsive
Web Engineering Community
Web Analytics
Open
Visualization
Khaled Rashed Community
and
Cristina Balasoiu Information
Simulation
Systems
Ralf Klamma
Community Community
Support Analytics
Lehrstuhl Informatik 5
Requirements
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-2
Engineering
3. Deutschen Akademischen
Austauschdienstes
Agenda
Introduction and motivation
Related work
CollaborateCom2012
Our Approach
Khaled Rashed
Cristina Balasoiu
– Expert ranking algorithm
Ralf Klamma
– Robustness of the expert ranking algorithm
Evaluation
Conclusions and outlook
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-3
4. Deutschen Akademischen
Austauschdienstes
Introduction
The expert search and ranking refer to the way of finding a
group of authoritative users with special skills and knowledge
CollaborateCom2012
for a specific category.
Khaled Rashed
Cristina Balasoiu
The task is very important in online collaborative systems
Ralf Klamma
Problems: openness and misbehaviour and
– No attention has been made to the trust and reputation of experts
Solution: Leveraging trust
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-4
5. Deutschen Akademischen
Austauschdienstes
Motivation Examples
Manipulating the truth for war Tidal bores presented as Indian Ocean
propaganda Tsunami
CollaborateCom2012
Khaled Rashed
Cristina Balasoiu
Ralf Klamma
Published as: British soldiers abusing Published as: 2004 Indian Ocean Tsunami
prisoners in Iraq Proved to be tidal bores, a four-day-long
Proved to be fake by Brigadier Geoff government-sponsored tourist festival in
Sheldon who said the vehicle featured China
in the photo had never been to Iraq
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
Expert knowledge, analysis and witnesses are needed to identify the fake!
I5-DR-0312-5
6. A Case Study: Collaborative Fake Multimedia
Deutschen Akademischen
Austauschdienstes
Detection System
Collaborative activities (rating, tagging and commenting)
– Provide new means of search, retrieval and media authenticity
evaluation
CollaborateCom2012 – Explicit ratings and tags are used for evaluating authenticity of
multimedia items
Khaled Rashed
Cristina Balasoiu – Reliability: not all of the submitted ratings are reliable
Ralf Klamma – No centralized control mechanism
– Vulnerability to attacks
Three types of users
– Honest users
– Experts
Lehrstuhl Informatik 5
(Information Systems)
– Malicious users
Prof. Dr. M. Jarke
I5-DR-0312-6
7. Deutschen Akademischen
Austauschdienstes
Research Questions and Goals
Research questions
– How to measure users’ expertise in collaborative media sharing and
CollaborateCom2012 evaluating systems? and how to rank them?
Khaled Rashed
– What is the implication of trust
Cristina Balasoiu
Ralf Klamma – Robustness! how to ensure robustness of the ranking algorithm
Goals
– Improve multimedia evaluation
– Reduce impacts of malicious users
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-7
8. Deutschen Akademischen
Austauschdienstes
Related Work
Probabilistic models e.g.[Tu et al.2010]
Voting models [Macdonald and Ounis 2006] [Macdonald et al.2008]
CollaborateCom2012
Link-based approaches PageRank [Brein and Page 1998], HITS
[Kleinberg1999] and their variations. SPEAR algorithm [Noll et al. 2009]
Khaled Rashed
Cristina Balasoiu
Ralf Klamma ExpertRank [Jiao et al. 2009]
TREC enterprise track -Find the associations between candidates
and documents e.g.[Balog 2006, Balog 2007]
Machine learning algorithms e.g. [Bian and Liu 2008, Li et al. 2009]
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-8
9. Deutschen Akademischen
Austauschdienstes
Our Approach
Assumptions
– Expert users tend to have many authenticity ratings
CollaborateCom2012 – Correctly evaluated media are rated by users of high expertise
Khaled Rashed – Following expert users provides more benefits
Cristina Balasoiu
Ralf Klamma Expert definition
– Rates a big number of media files in an authentic way with respect to
a topic and Highly trusted by his directly connected users
– Should be trustable in evaluating multimedia
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-9
10. Deutschen Akademischen
Austauschdienstes
Expert Ranking Methods
Domain knowledge driven method
– Considers tags that users assign to media files
– User profile: merging tags user submitted to the media files in the
CollaborateCom2012 system
Khaled Rashed
– Similarity coefficient between the candidate profile and the tags
Cristina Balasoiu assigned to a specific resource
Ralf Klamma – Used to reorder users who voted a media file according to the tag
profile
Domain knowledge independent method
– Use the connections between users and resources to decide on the
expertise of the users
Lehrstuhl Informatik 5
– A modified version of HITS algorithm
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-10
– Mutual reinforcement of users expertise and media
11. Deutschen Akademischen
Austauschdienstes
MHITS : Expert Ranking Algorithm
MHITS: Expert ranking algorithm in online collaborative systems
– Link-based approach, based on HITS algorithm
CollaborateCom2012
– HITS
– Authorities: pages that are pointed to by good pages
Khaled Rashed
Cristina Balasoiu
– Hubs: pages that points to good pages
Ralf Klamma – Reinforcement between hubs and authorities
– MHITS
– Users act as hubs (correctly evaluated media rated by them)
– Media files act as authorities
– Mutual reinforcement between users and media files
Lehrstuhl Informatik 5
(Information Systems)
– Local trust values between users are assigned
Prof. Dr. M. Jarke
I5-DR-0312-11 – Considers the rates of the users
12. Deutschen Akademischen
Austauschdienstes
MHITS: Expert Ranking Algorithm
a(m) h(u ) r (u )
u U ( m)
CollaborateCom2012 h(u) β a(m) r(u) ( 1 β) t(u)
m M(u)
Khaled Rashed
Symbol Description
Cristina Balasoiu a(m) Authority score
Ralf Klamma U(m) Set of users pointing to media file m
h(u) Hubness score
r(u) Rating of user u for media file m
one network for users and ratings t(u) Average trust of the direct connected
users to user u
one for users only (trust network). M(u)
Set of media files to which user u points
Trust in range [0, 1] Coefficient that weights the influence of
Lehrstuhl Informatik 5
(Information Systems) Ratings 0.5 for a fake vote, the two terms, in range [0, 1]
Prof. Dr. M. Jarke
I5-DR-0312-12 1 for an authentic vote
13. Deutschen Akademischen
Austauschdienstes
Robustness of the MHITS Algorithm
Compromising techniques
– Sybil attack [Douc02], Reputation theft, Whitewashing attack, etc.
– Compromising the input and the output of the algorithm
Sybil attack
CollaborateCom2012
Khaled Rashed – Fundamental problem in online collaborative systems
Cristina Balasoiu
– A malicious user creates many fake accounts (Sybils) which all
Ralf Klamma
reference the user to boost his reputation (attacker’s goal is to be
higher up in the rankings)
Countermeasures against Sybil attack
SybilGuard [YKGF06] SybilLimit [YGKX08] SumUp [TMLS09]
Protocol type Decentralized Decentralized Centralized
Lehrstuhl Informatik 5 Accepted Sybils per
(Information Systems)
Prof. Dr. M. Jarke attack edge
I5-DR-0312-13
14. Deutschen Akademischen
Austauschdienstes
SumUp
Centralized approach SumUp Steps
– Aims to aggregate votes in a (1) Assign the source node and
Sybil resilient manner number of votes per media file
CollaborateCom2012
Key idea – adaptive vote flow (2) Levels assignment
Khaled Rashed technique - that appropriately (3) Pruning step
Cristina Balasoiu
assigns and adjusts link capacities (4) Capacity assignment
Ralf Klamma
in the trust graph to collect the votes (5) Max-flow computation – collect
for an object
votes on each resource
New: we Integrate SumUp with the (6) Leverage user history to penalize
MHITS Java implementation – used
adversarial nodes
own data structure based on Java
Lehrstuhl Informatik 5
Sparse Arrays
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-14
15. Deutschen Akademischen
Austauschdienstes
Integration of SumUp with MHITS
CollaborateCom2012
Khaled Rashed
Cristina Balasoiu
Ralf Klamma
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-15
16. Deutschen Akademischen
Austauschdienstes
Evaluation
Experimental Setup
– BarabasiAlbert model for generating network
– 300 users
CollaborateCom2012
– 20 media files (10 known to be fake and 10 known to be authentic)
Khaled Rashed
– 800 ratings
Cristina Balasoiu – 3000 trust edges
Ralf Klamma
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-16
17. Deutschen Akademischen
Austauschdienstes
Ratings Distribution
CollaborateCom2012
Khaled Rashed
Cristina Balasoiu
Ralf Klamma
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-17
18. Deutschen Akademischen
Austauschdienstes
Evaluation
Evaluation metrics:
TopK' TopK
– Precision@K recision@K
K
CollaborateCom2012
– Spearman’s rank correlation coefficient
+1 0 -1
Khaled Rashed n
Cristina Balasoiu 6 d i2
Perfect Positive No Correlation Perfect Negative
Ralf Klamma ρs 1 i 1
Correlation Correlation
n(n2 1)
p - Spearman’s coefficient of rank correlation -1 ≤ ps ≤ 1
di - is the different between the rank of xi and the rank of yi
n:- the number of data points in the sample (total number of observations)
ps = - 1 or 1 high degree of correlation between x any y
Ps = 0 a lack of linear association between two variables
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-18
19. Deutschen Akademischen
Austauschdienstes
Experimental Results I
CollaborateCom2012
Khaled Rashed
Cristina Balasoiu
Ralf Klamma
No Sybils
HITS MHITS
Results are compared with the ranking
of the users according to the number of
fair ratings each of them had in the system Spearman 0.87 0.93
Lehrstuhl Informatik 5
(Information Systems) n=15
Prof. Dr. M. Jarke
I5-DR-0312-19
20. Deutschen Akademischen
Experimental Results II
Austauschdienstes
CollaborateCom2012
Khaled Rashed
Cristina Balasoiu
Ralf Klamma
10% Sybils HITS MHITS MHITS & SumUp
4 attack edges
Spearman 0.52 0.68 0.93
Lehrstuhl Informatik 5
(Information Systems)
n=20
Prof. Dr. M. Jarke
I5-DR-0312-20
21. Deutschen Akademischen
Experimental Results III
Austauschdienstes
Precision@K
CollaborateCom2012
Khaled Rashed
Cristina Balasoiu
Ralf Klamma
10% Sybils (one group) and 8 attack edges 20% Sybils (one group) and 24 attack edges
Lehrstuhl Informatik 5
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-21
22. Deutschen Akademischen
Austauschdienstes
Further evaluation
3% 17% - Number of Sybil votes increased with respect to the
total number of fair votes
– expertise ranking does not change
CollaborateCom2012 9 to 14 and 24 Number of attack edges was increased keeping the
number of Sybil votes to 17% percent of the number of fair votes and
Khaled Rashed
constant number of Sybils (50)
Cristina Balasoiu
Ralf Klamma – precision does not change
17% 50% and then to 100% the number of Sybil votes Increased
keeping constant the Nr of attack edges (24) and Sybils Nr.
K MHITS MHITS & SumUp MHITS MHITS&SumUp MHITS MHITS & SumUp
20% 20% 50% 50% 100% 100%
12 0.91 0.91 0.27 0.33 0.08 0.08
Lehrstuhl Informatik 5
15 0.93 0.93 0.33 0.40 0.06 0.06
(Information Systems)
Prof. Dr. M. Jarke
I5-DR-0312-22
23. Deutschen Akademischen
Austauschdienstes
Conclusions and Future Work
Conclusions
– Proposed an expertise ranking algorithm in collaborative systems
CollaborateCom2012 (fake multimedia detection systems)
Khaled Rashed – Leveraging trust and showed the trust implications
Cristina Balasoiu
Ralf Klamma – Combination of expert ranking and resistant to Sybils algorithms
Future Work
Applying the algorithm on real data and on different data sets
– Temporal analysis –time series analysis
Lehrstuhl Informatik 5
(Information Systems)
– Integrate the domain knowledge driven method
Prof. Dr. M. Jarke
I5-DR-0312-23
Editor's Notes
Fake multimedia and misbehaviour
e.g. Press Agencies
we discuss the notions of experts and expertise in the context of collaborative fake multimedia detection systems.Here we try to define the expert and we asume that ….Improve media evaluation (by increasing the impact of experts)
SybilGuard, SybilLimitaredescentralizedSumUpiscentralizerdSybilGuard is based on the “social network” among user identities, where an edge between two identities indicates a human-established trustrelationship. Malicious users can create many identities but few trust relationships. Thus, there is a disproportionately-small “cut” in the graph between the sybil nodes and the honest nodes. SybilGuard exploits this property to bound the number of identities a malicious usercancreate.SybilLimit – leverages the same insight as SybilGuard but is an improved version that reduces the accepted Sybil nodes of a honest node from O(nlogn) to O(logn) for n honest nodesWhen all nodes vote, SumUp leads to much lower attack capacity than SybilLimit despite the same asymptotic bound per attack edgeFirst, SumUp’s bound of 1 + log n inTheorem 5.1 is a loose upper bound of the actual average capacity. Second, since links pointing to lower-levelnodes are not eligible for ticket distribution, many incoming links of an adversarial nodes have zero tickets and thusare assigned capacity of one
P@K computes for a given result of ranked users, the fraction of relevant results in the top K results. The higher the precision, the betterthe performance is. We use this metric to compare the results of the expert ranking algorithms that we developed with the ranking of experts resulted by counting the numberof fair votes.Spearman’s rank correlation coefficienis a non-parametric measure of statistical dependence between two ranked lists.Spearman’s rank correlation coefficient it is based on rank order of scores and not the score data. Correlation Coefficient between the ranked variables d= Difference of rank between paired item in two series (lists).
For this step of the evaluation, I assume that all users in the network are behaving ina fair way and are rating a random number of media files. So the only way the userscan rate a media file wrong, is when the user has no competence in the specific topic.What is different in the two methods isthat, besides the reinforcement between users voting fairly and authentic media files,the ranking in the case of the MHITS considers also the local trustvalues the user has in the social network.Since average precision ignores the exact rank of a user, we use the Spearman's rankcorrelation coefficient to get a better view of the efficiency. In Table 6.2, the correlationcoefficients for n = 15 are presented. One can notice that the result of the MHITS algorithm is higher correlated to the fair number of media file ranking as thevalue gets closer to 1
From the results, we can see that our proposed model integration of Sumup to Mhits algorithm outperforms the HITS and the MHITS with out SumUp, which confirms the effectiveness of our approachAs it can be seen, the MHITS in combination with SumUp performs better for K = 10 and then for K = 20 the precision decreases much rapidly even than the MHITS. We think that this happens due to the fact that some Sybil users are already entering the ranking for K = 20 due to their high local trust values and therefore the precision decreases.
It can be noticed that by increasing the number of the Sybils, the attack edges or even the votes (up to 50% of the number of the fair votes), the ranking of the users do not change dramatically. Also it can be seen that the Modified HITS with SumUp performs only slightly better than the ModifiedHITS alone. The reason for these facts is that the steps that are additionally done by SumUp when run together with HITS which are: pruning of the trust network, assignment of capacity in the network and elimination of the links that posses high negative history do not affect the Sybils.The reason for this is that the capacity assignment does not reach them so votes from Sybils do not reach the source node. In this case, the edges connecting Sybils to fair nodes do not accumulate negative history and therefore are not eliminated. On this resulting network, Modified HITS is run again. The Sybils are kept and due to the high local trust values that they have from the other Sybil nodes in the group, they get into the top rank of experts.
Combination of expert ranking and resistant to Sybils algorithms to ensure robustness