2. Background
• Information is often untrustworthy
• Erroneous (e.g, news site at breaking events)
• Misleading (e.g., malicious sources)
• Biased (e.g., political domains)
• Outdated (e.g., knowledge base that doesn’t update
frequently)
• This phenomenon is amplified by the widespread
information dependency (copy-paste)
• It is difficult for the user to discern the correctness of
information and the trustworthiness of the sources
2
4. Data Corroboration
• Early Corroboration
• Frequency-based approach
• Recent work on Corroboration techniques
• Trustworthiness of sources
A measure s(s) that quantify the precision of a source s
• Probability of information (facts)
A measure that s(f) quantify the probability that a fact f is true
• Starting with a default s(s), iteratively compute the
probabilities for the facts and the trustworthiness of the
sources
• Machine-Learning approaches
• Some corroboration problems can be seen as a ML
classification problem
4
5. What if there is no conflicts?
• Does the presence of information without
contradictions means it is correct?
5
6. Our Problem: Corroborating Information
with only Affirmative Statements
• We focus on scenarios in which sources have little
or no dissention
• Frequent real-world problem (rumors, hard-to-rebut
claims)
• Difficult to identify incorrect information since all
reported information is consistent
• Existing corroboration approaches do not work
well
• Rely on conflicting information to differentiate the
trustworthiness of the sources
6
7. Contributions
• Novel corroboration approach:
• Assigns multiple trust scores to each sources
• Considers the trustworthiness of the source for a group of
facts
• Corroboration algorithm incrementally evaluates facts
• Groups unknown facts based on the sources reporting
them
• Makes decisions based on information entropy
• Extensive real world and synthetic experiments that
demonstrate the benefits of our method
7
8. Evaluation Setting
• Corroboration task:
• Sources for restaurant address: Citysearch,
Foursquare, Menupages, Opentable, Yellowpages,
Yelp
• Golden set
• Selected restaurants in 3 zip codes: 601 listings
• Verified their legitimacy in person (Apr 2012)
• 340 true and 261 false
Identify legitimate restaurant listings in NYC given
the listing information from a set of sources
8
9. Motivating Example
Opentable Yelp Menupages Citysearch Yellowpag
es
Correct
value
M Bar T T true
Sam’s T T T T true
27 Sunshine T T T true
Crepe
Creations
T T false
El Portal T T false
Holy Basil F T false
Papatzul T T true
Wine Spot T T true
Vbar T T true
Wai Cafe T T false
Tomoe Sushi T T T true
Khushie 139 F F T false
9
10. State-of-the-art
Corroboration Strategies
Approaches
• TwoEstimate [Galland WSDM’10]
• Iteratively estimates the trust score of the sources
and the probability of the facts
• BayesEstimate [Zhao VLDB’12]
• Uses a Bayesian graphical model
• Considers a two-sided errors (false positives and
false negatives)Precisio
n
Recall Accurac
y
Computed trust scores
TwoEstimate .64 1 .67 (1, 1, 0.8, 0.9, 1)
BayesEstima
te
.58 1 .58 (1, 0.8, 0.6, 1, 1)
used to evaluate each fact!10
11. Key Observation
• Using the same trust score to judge the correctness
of all information is too coarse
• Each source may exhibit different accuracy towards
different group of facts
• The corroboration result could be greatly improved if
we could derive finer-grained trust scores for each
source
11
Multi-value trust scores for sources
12. Trust Scores
• Single-value trust scores (s(s))
• A single measure for each source
• Each fact is evaluated using the same value from each
source
• Multi-value trust scores
• A group of values assigned to each source
s(s) = < s1(s), s2(s), …>
• Each (group of) fact is evaluated using one of the trust
values from each source
12
13. Multi-Value Trust Scores
• Two major challenges
• How to calculate the trust values for each source
• How to decide which sources’ trust values to
consider for each fact
• Solution: an incremental evaluation mechanism
• Select a subset of facts to process
• Update the trust values based on the already
processed facts
• Facts are assigned a truth value when they are
processed
13
14. How to Select Facts?
• Model each fact f as a random variable
• Objective: compute the probability s(f) that f is true
• Information Entropy approach:
• Consider the entropy H(f) of each fact f
• The entropy of a random variable measures its uncertainty
• Our solution: select facts such that the entropy of
unknown facts are maximized
• Existing corroboration techniques normalize their results
to attain a probability of 1 (or 0) for each fact, i.e., entropy
of 0
• Reducing uncertainty leads to (too) early consensus
14
15. Heuristics for Selecting Facts
• Group facts based on the votes from sources
• At each step i:
• Calculate the entropy of each fact group using si(s)
• Calculate ΔH(FG) for each fact group FG
(Represents the change of entropy if FG is selected)
• Select both positive and negative fact groups with highest
ΔH(FG)
• Assign positive and negative values to the same number of
facts
15
21. Conclusion
• Proposed techniques for corroborating facts with
mostly affirmative statements
• Designed a novel algorithm that adopts a multi-value
trust score for the sources
• Incrementally selects facts by leveraging the information
entropy of unknown facts
• Uses different sets of sources’ trust scores to evaluate ach
sets of facts
• Performed experiments using both real world and
synthetic (see paper) data
21