Presented at the 31st ACM User Interface Software and Technology Symposium (UIST), 2018. Paper: https://www.ischool.utexas.edu/~ml/papers/nguyen-uist18.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
1. Believe it or not: Designing a Human-AI
Partnership for Mixed-Initiative Fact-Checking
Joint work with
An Thanh Nguyen (UT), Byron Wallace (Northeastern), & more!
Matt Lease
School of Information @mattlease
University of Texas at Austin ml@utexas.edu
Slides:
slideshare.net/mattlease
2. “The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at over 65 universities around the world
www.ischools.org
What’s an Information School?
2Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
3. Information Literacy
National Information Literacy Awareness Month,
US Presidential Proclamation, October 1, 2009.
“Though we may know how to find the information
we need, we must also know how to evaluate it.
Over the past decade, we have seen a crisis of
authenticity emerge. We now live in a world where
anyone can publish an opinion or perspective, whether
true or not, and have that opinion amplified…”
3Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
4. “Truthiness”
“Truthiness is tearing apart our country... It used to
be everyone was entitled to their own opinion, but
not their own facts. But that’s not the case anymore.”
– Stephen Colbert (Jan. 25, 2006)
4Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
9. Fake News Challenge
9Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
10. 10Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
11. 11Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
12. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Challenges
• Fair, Accountable, & Transparent (AI)
– Why trust “black box” classifier?
– How do we reason about potential bias?
– Do people really only want to know true vs. false?
– How to integrate human knowledge/experience?
• Joint AI + Human Reasoning, Correct Errors, Personalization
• How to design strong Human + AI Partnerships?
– Horvitz, CHI’99: mixed-initiative design
– Dove et al., CHI’17 “Machine Learning As a Design Material”
12
13. MemeBrowser (Ryu et al., ACM HyperText’12)
http://odyssey.ischool.utexas.edu/mb
13Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
14. • Crowdsourced stance labels
– Hybrid AI + Human (near real-time) Prediction
• Joint model of stance, veracity, & annotators
– Interaction between variables
– Interpretable
• Source on github
Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Nguyen et al., AAAI’18
14
15. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
This Work
15
Demo!
16. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
This Work
16
http://fcweb.pythonanywhere.com
17. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Primary Interface
17
18. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Source Reputation
18
19. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
System Architecture
• Google Search API
• Two logistic regression models
– Stance (Ferreira & Vlachos ’16) w/ same features
• average accuracy > 70% but variable over claims
– Veracity (Popat et al. ‘17)
– Scikit-learn, L1 regularization, Liblinear solver, &
default parameters
19
20. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Data: Train & Test
Emergent (Ferreira & Vlachos ’16)
Accuracy of prediction models
20
21. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
User Study 1: Setup
• 2 Groups: Control vs. System
21
22. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
User Study 1: Setup
• 2 Groups: Control vs. System
22
23. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
User Study 1: Setup
• 2 Groups: Control vs. System
23
24. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
User Study 1: Setup
• 2 Groups: Control vs. System
– 113 participants (58 control, 55 system) – MTurk
• 1. Asked to predict claim veracity
– Likert: Def. False, Prob. F., neutral, Prob. True, Def. T.
– Error: distance
• For a (definitely) false claim, PF -> error=1, DT -> error=4
• 2. Shown model’s claim prediction
24
25. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
User Study 1: Setup
• 2 Groups: Control vs. System
25
26. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
User Study 1: Setup
• 2 Groups: Control vs. System
26
27. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
User Study 1: Setup
• 2 Groups: Control vs. System
– 113 participants (58 control, 55 system) – MTurk
• 1. Asked to predict claim veracity
– Likert: Def. False, Prob. F., neutral, Prob. True, Def. T.
– Error: distance
• For a (definitely) false claim, PF -> error=1, DT -> error=4
• 2. Shown model’s claim prediction
– Asked if they want to change their answer?
27
28. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Before seeing model’s claim prediction
• “System” group:
– claims 1-2: > avg. prediction error
– claim 4: < error
– claims 3, 5: only small differences
• Human accuracy in claim prediction roughly
follows model’s accuracy in stance prediction
– i.e., helped when model correct, hurt when not
28
29. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
After seeing model’s claim prediction
• “System” group:
– Smaller changes for errors
– change answers > “control” group
• Human accuracy in claim prediction roughly
follows model’s accuracy in veracity prediction
– i.e., helped when model correct, hurt when not
29
30. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Study 1: Statistical Significance (1 of 2)
• Mixed-effects Generalized Linear Model (GLM)
• Before seeing model’s claim predictions
– CSP: # correct stance predictions seen
– WSP: # wrong stance predictions seen
– 2-tail test includes unlikely possibility that seeing
correct stance predictions increases human error
30
31. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Study 1: Statistical Significance (2 of 2)
• Mixed-effects Generalized Linear Model (GLM)
• After seeing model’s claim predictions
– CSP: # correct stance predictions seen
– WSP: # wrong stance predictions seen
– 2-tail test includes unlikely possibility that seeing
correct stance predictions increases human error
31
32. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
User Study 2: Setup
• 2 Groups: Control vs. Slider
32
33. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
User Study 2: Setup
• 2 Groups: Control vs. Slider
– 109 participants (51 control, 58 slider) – MTurk
• 1. Asked to predict claim veracity
– Likert: Def. False, Prob. F., neutral, Prob. True, Def. T.
and indicate confidence in prediction
– (-20,+20) score range: accuracy x confidence
• eg, a correct answer with 75% confidence -> 20x75%=15
33
34. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
• “Slider” group:
– Claims 1-3: > score on average
– Claims 4-5: < first quartiles, but medians same
• Some participants negatively impacted by slider
– No statistically significant difference on average
34
User Study 2: Results
35. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Discussion & Future Work
• Fact Checking & IR (Lease, DESIRES’18)
– How to diversify search results for controversial topics?
– Information evaluation (eg, vaccination & autism)
• Potential harm as well as good
– Potential added confusion, data / algorithmic bias
– Potential for personal “echo chamber”
– Adversarial applications
• Future Work
– Making personalization more visible
– Collaborative use, small and big
• 1st author Nguyen looking for a postdoc!
35
36. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Conclusion
• Fact-checking more than black-box prediction
– Interaction, exploration, trust
• We proposed a mixed-initiative human + AI
partnership for fact-checking
– Back-end AI + front-end interface/interaction
– Support AI + human collaboration
– Fair, Accountable, & Transparent (FAT) AI
36
37. Matt Lease (UT Austin) • Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact-Checking
Thank You!
Slides: slideshare.net/mattlease
Lab: ir.ischool.utexas.edu
37