This document summarizes Rob Ennals' work on developing a "Dispute Finder" tool to help users identify when information they encounter online is disputed by other credible sources. The tool includes a browser extension that highlights disputed claims on web pages, and links to sources supporting or opposing the claim. It also mines the web for patterns indicating disputed claims and labels the claims through crowdsourcing. Key challenges discussed include filtering ambiguous claims, scaling entailment detection, and determining what should count as a disputed claim. User studies found people were frustrated by a low number of highlighted claims and confused about how specific claims should be.
Breaking the Kubernetes Kill Chain: Host Path Mount
Disputed Claims Browser Extension Highlights Controversial Web Information
1. 1/44 Is there another side to this? Identifying Disputed Information on the Web Rob Ennals, Intel Research Berkeley - rob@ennals.org Work done in collaboration with: John Mark Agosta, Dan Byler, Beth Trushkowsky, Barbara Rosario, Tad Hirsch, Tye Rattenbury
6. Old Model: small number of known sources TV, Radio, Newspaper, Book PublishersNew Model: huge number of unknown sources Blogs, random websites, foreign newspapers 5/44 Not just an issue of source credibility. If we ignore untrusted sources then we ignore a lot of the information on the web.
7. 6/44 Dispute Finder: inform users when information that they encounter in their lives is disputed by a source that they might trust
8. 7/44 Browser extension Firefox extension examines every page you browse (including email, intranet pages, etc). Highlights claims that are disputed.
9. 8/44 Click a dispute for more information Show sources that support or oppose the claim.
10. 9/44 Search Engine Front-End Built with Yahoo BOSS. Examines text on all linked pages.
11. Early Work:Mobile Voice Interface Currently an early prototype, running on a laptop, based on Dragon NaturallySpeaking. Listen to everything people say around you. Keep a list of disputed things you may have heard. Vibrate when you hear something disputed. 10/44
14. People seem to like it Covered by: NPR, New Scientist, Fast Company, Christian Science Monitor, Wall Street Journal, NY Times Bay Area, San Jose Mercury, SF Chronicle, The Guardian, ACM TechNews, CBC (Canadian Public Radio), Cnet, Sacramento Bee, + many others TG Daily: “This is hands down, the most amazing idea I’ve ever heard of when it comes to using the web” Paper accepted for WWW 2010 + WICOW 2010.
16. Related Work: Social Annotation 15/44 Videolyzer Diigo Diigo Need to mark every instanceindividually SpinSpotter
17. Related Work: Fact Checker Sites 16/44 Need to suspect somethingmay be disputed.
18. Related Work: Source Rating 17/44 Automatic quality metrics. But: Non-credible sources still have useful information. But: Credible sources still get stuff wrong.
19. Related Work: Wiki Source Tracking 18/44 WikiTrust WikiScanner Who wrote this, and are they credible/biased? Great if your content is on wikipedia.
21. 20/44 Compare Observed Text to Known Disputes Glenn Beck falsely claimed that the moon is made of cheese, despite clear evidence to the contrary. False claim: "the moon is made of cheese" Disputed by: Huffington Post, New York Times Context: ... Entailment: "We should mine the moon because it ismadeof cheese"
23. 22/44 Contradiction detection vs Dispute Detection Contradiction detection: Does statement X logically contradict statement Y. Hard: need lots of real-world knowledge. Dispute detection: Does author A believe that statement X is disputed or misleading. Humans determine what is actually disputed. Humans determine which disputes are interesting. Only detects contradictions that humans find. Detects statements that are misleadingwithout being wrong. Once we have determined that a dispute is real, could use contradiction detection and sentiment analysis to see who is on each side.
24. 23/44 A statement can be misleading without being wrong GM's misleading claim that the Chevrolet Volt gets 230 miles per gallon deceptively claimed that fast food could be nutritious Logical truth isn't all that interesting. We want to know if there is a different way of looking at the subject. A different frame.
26. 25/44 Use Patterns to Find Disputed Claims the false claim that Himalayan glaciers could melt away by 2035 it is not true that anyone aged over 59 cannot receive heart repairs the misconception that everyone in the south are stupid the delusion that scientists in different countries do science differently into believing that Van Morrison had a new baby the myth that we can't afford good working conditions for everyone misleadingly claimed that unemployment is lower than the '70s We built a simple grammar for such prefixes. Currently 1293 patterns, identified on ~ 35 million web pages. of which we have downloaded and processed 2 million. Restricting to prefixes allows us to search for them using Yahoo BOSS. Future: automatically infer a larger grammar of patterns
27. 26/44 Some Disputes I Wasn’t Aware of The Niger-Iraq Uranium connection has been discredited Medieval Europeans thought the world was flat Dinosaurs looked sleek and reptilian. Dietary Cholesterol is a problem. “Wear and Tear” causes arthritis Specific foods cause ulcers Estimates from Yahoo BOSS. Not all URLs downloaded.
28. Most Disputed Nouns God Iraq Government Obama War 6.Israel 7. President 8. Women 9. Money 10. Jesus
29. 28/44 Search for all patterns on Yahoo BOSS Yahoo BOSS is an API for Yahoo search. BOSS API has a limit of 1000 hits per query, so salt with year and month. +"falsely claimed that" +2010 +"falsely claimed that" -2010+2009 +"falsely claimed that" -2010-2009+2008 +"falsely claimed that" -2010-2009 -2008+2007 Needed for 197 patterns. We talked to Yahoo first... Future: get direct access to complete results for a pattern
30. 29/44 Claims need to be filtered the false claim that won't go away falsely claimed that he didn't do it wrongly think that the bill will pass wrongly think that Great Britain doesn't the myth thatElvis is alivehas a long history falsely claim thatfull commentary below fragment ambiguous suffix extractionerror
31. 30/44 Labeled data from Mechanical Turk $0.04 to label 10 claims, two of which are known. If a turker gets known items wrong, reject their work. Each claim labeled by two turkers.
32. 31/44 Problem: text may not be a statement the false claim that won't go away the belief that works best the lie that people fell for Current approach: Is the first word a verb? finds 71% of bad claims mistakenly drops 2% of good claims Works for first two, but not last.
33. 32/44 Problem: ambiguous claims he didn't do it the union was a party in the proceedings the other parent is abusive our troops have committed atrocities property taxes are regressive Obama is a communist Bad Maybe Good If two pages say X, do they mean the same thing? Turk: 61.9% agreement - often very subjective Future: associate claim with page topic
34. 33/44 Wikipedia links tell us what is unambiguous property taxes are regressive Obama is a communist Is this word always linked to the same thing? Precision: 73% Recall: 73% (vs gold data + word features)
39. Users add evidence to support claims 38/44 A claim will not be shown to others unless the user finds a source that argues against it.
40. Users identify a disputed claim on a page 39/44 Define a new disputed claim, or add paraphrase for existing disputed claim.
41. 40/44 User Study Results Frustrated by low number of claims that were highlighted - motivated text mining approach Did not appreciate that a claim should apply to multiple pages - particularly when using context menu approach Confused about how specific a claim should be E.g. “Global temperatures will rise by X degrees” Users created claims with ambiguous meanings E.g. saying “wood” to mean “Ronnie Wood” Confused by double-negatives when adding evidence E.g. opposes global warming does not exist Future: use users to improve mined claims
43. 42/44 Entailment is resource constrained Must compare many sentences against a huge number of claimsin a fraction of a second.
44.
45. if you have a big enough corpus then it works okFuture: better entailment that still scales Future: look at context, and other places same text appears
46. What is Disputed? 44/44 Anything disputed by anyone? - we get overwhelmed with claims disputed by nutcases Anything disputed by a “reliable source”? - what is a “reliable source”? (Wikipedia rules?) - do we end up enforcing “orthodox” beliefs and stifling debate? Anything disputed by a source that I would trust? - we reinforce existing echo-chamber problem Anything disputed by my friends? - do I agree with my friends - should I be encouraged to agree with them Future: learn what to show a user by analyzing their behavior
47. Interviews: Do people want this? 45/44 Hard to change established opinions They think they already understand the issue. They would have to publically back down So focus on issues they don’t yet have an opinion on? Hard to make someone accept the other side Social identity in “us” vs “them” Not willing to listen to “other side” So give sources from their “own” side? Sometimes people may not care Reading just for entertainment and conversation material Don’t care much if they are wrong Not interested in challenging opinions of others Focus on issues that affect them personally Dispute Finder probably isn’t for everyone