SlideShare a Scribd company logo
1 of 38
Download to read offline
Towards Explainable
Fact Checking
Isabelle Augenstein
augenstein@di.ku.dk
@IAugenstein
http://isabelleaugenstein.github.io/
DIKU Business Club
18 March 2021
Overview of Today’s Talk
• Fact checking – what is it and why do we need it?
• False information online
• Content-based automatic fact checking
• Explainability – what is it and why do we need it?
• Making the right predictions for the right reasons
• Model training pipeline
• Explainable fact checking – some first solutions
• Rationale selection
• Generating free-text explanations
• Wrap-up
Fact Checking
3
Types of False Information
https://ec.europa.eu/info/live-work-travel-eu/coronavirus-response/fighting-disinformation/tackling-coronavirus-disinformation_en
Types of False Information
Types of False Information
http://www.contentrow.com/tools/link-bait-title-generator
Types of False Information
https://rokokoposten.dk/2021/02/23/mars-indfoerer-indrejseforbud-efter-fund-af-britisk-virusvariant/
Types of False Information
Types of False Information
• Disinformation:
• Intentionally false, spread deliberately
• Misinformation:
• Unintentionally false information
• Clickbait:
• Exaggerating information and under-delivering it
• Satire:
• Intentionally false for humorous purposes
• Biased Reporting:
• Reporting only some of the facts to serve an agenda
Types of False Information
• Disinformation:
• Intentionally false, spread deliberately
• Misinformation:
• Unintentionally false information
• Clickbait:
• Exaggerating information and under-delivering it
• Satire:
• Intentionally false for humorous purposes
• Biased Reporting:
• Reporting only some of the facts to serve an agenda
Full Fact Checking Pipeline
Claim Check-
Worthiness Detection
Evidence Document
Retrieval
Stance Detection /
Textual Entailment
Veracity Prediction
“Immigrants are a drain
on the economy”
not check-worthy
check-worthy
“Immigrants are a drain
on the economy”
“Immigrants are a drain on the
economy”, “EU immigrants have a
positive impact on public finances”
positive
negative
neutral
true
false
not enough info
“Immigrants are a drain
on the economy”
Explainability
12
Explainability – what is it and why do we need it?
Architect Trainer End User
Terminology borrowed from Strobelt et al. (2018), “LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent
neural networks. IEEE transactions on visualization and computer graphics.”
true
Explainability – what is it and why do we need it?
Claim: “In the COVID-19 crisis, ‘only 20%
of African Americans had jobs where they
could work from home.’”
Evidence: “20% of black workers said they
could work from home in their primary
job, compared to 30% of white workers.”
Claim: “Children don’t seem to be getting
this virus.”
Evidence: “There have been no reported
incidents of infection in children.”
Claim: “Taylor Swift had a fatal car
accident.”
Reason: overfitting to spurious patterns
(celebrity death hoaxes are common)
Claim: “Michael Jackson is still alive,
appears in daughter’s selfie.”
Reason: overfitting to spurious patterns
(celebrity death hoaxes are common)
Right prediction Wrong prediction
Right reasons
Wrong reasons
Explainability – what is it and why do we need it?
Architect Trainer End User
?
true
Explainability – what is it and why do we need it?
Architect Trainer End User
?
true
Explainability – what is it and why do we need it?
Architect Trainer End User
?
true
Towards
Explainable
Fact Checking
18
Explanation via Rationale Selection -- Sentences
Claim: The Faroe Islands are no longer part of a kingdom.
Evidence document: The Faroe Islands, also called the Faeroes, is an
archipelago between the Norwegian Sea and the North Atlantic, about halfway
between Norway and Iceland, 200 mi north-northwest of Scotland.
The Islands are an autonomous country within the Kingdom of Denmark.
Its area is about 1,400 km2 with a population of 50,030 in April 2017.
(…)
Label: refutes
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal (2018). FEVER: a
large-scale dataset for Fact Extraction and VERification. In Proc. of NAACL.
https://arxiv.org/abs/1803.05355
Explanation via Rationale Selection -- Sentences
Claim: The Faroe Islands are no longer part of a kingdom.
Evidence document: The Faroe Islands, also called the Faeroes, is an
archipelago between the Norwegian Sea and the North Atlantic, about halfway
between Norway and Iceland, 200 mi north-northwest of Scotland.
The Islands are an autonomous country within the Kingdom of Denmark.
Its area is about 1,400 km2 with a population of 50,030 in April 2017.
(…)
Label: refutes
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal (2018). FEVER: a
large-scale dataset for Fact Extraction and VERification. In Proc. of NAACL.
https://arxiv.org/abs/1803.05355
Explanation via Rationale Selection -- Words
Hypothesis: An adult dressed in black holds a stick.
Premise: An adult is walking away empty-handedly.
Label: contradiction
Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom (2018). e-SNLI: Natural Language
Inference with Natural Language Explanations. In Proc. of NeurIPS.
https://proceedings.neurips.cc/paper/2018/hash/4c7a167bb329bd92580a99ce422d6fa6-Abstract.html
Explanation via Rationale Selection -- Words
Hypothesis: An adult dressed in black holds a stick.
Premise: An adult is walking away empty-handedly.
Label: contradiction
Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom (2018). e-SNLI: Natural Language
Inference with Natural Language Explanations. In Proc. of NeurIPS.
https://proceedings.neurips.cc/paper/2018/hash/4c7a167bb329bd92580a99ce422d6fa6-Abstract.html
Post-Hoc Explainability Methods via Rationale Selection
Example from Twitter Sentiment Extraction (TSE) dataset
Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein (2020). A Diagnostic Study of Explainability
Techniques for Text Classification. In EMNLP. https://www.aclweb.org/anthology/2020.emnlp-main.263/
• Agreement with Human Rationales (HA)
• Degree of overlap between human and automatic saliency scores
• Confidence Indication (CI)
• Predictive power of produced explanations for model’s confidence
• Faithfulness (F)
• Mask most salient tokens, measure drop in performance
• Rationale Consistency (RC)
• Difference between explanations for models trained with different
random seeds, with model with random weights
• Dataset Consistency (DC)
• Difference between explanations for similar instances
Post-Hoc Explainability Methods for via Rationale Selection:
Diagnostic Properties
Post-Hoc Explainability Methods via Rationale Selection:
Selected Results
Spider chart for Transformer model on e-SNLI
HA: Agreement with
human rationales
CI: Confidence indication
F: Faithfulness
RC: Rationale Consistency
DC: Dataset Consistency
Post-Hoc Explainability Methods via Rationale Selection:
Aggregated Results
0,4
0,45
0,5
0,55
0,6
0,65
0,7
Random
ShapSam
pl
LIM
E
O
cclusion
Saliency_u
Saliency_l2
InputXGrad_u
InputXGrad_l2
GuidedBP_u
GuidedBP_l2
Mean of diagnostic property measures for e-SNLI
Transformer CNN LSTM
Generating Free Text Explanations
https://www.politifact.com/factchecks/2020/apr/30/donald-trump/trumps-boasts-testing-spring-cherry-picking-data/
Ruling
Comments
Justification
https://www.politifact.com/factchecks/2020/apr/30/donald-trump/trumps-boasts-testing-spring-cherry-picking-data/
Generating Free Text Explanations
Generating Free Text Explanations
Claim:
We’ve tested more than every country
combined.
Ruling Comments:
Responding to weeks of criticism over his administration’s
COVID-19 response, President Donald Trump claimed at a
White House briefing that the United States has well
surpassed other countries in testing people for the virus.
"We’ve tested more than every country combined,"
Trump said April 27 […] We emailed the White House for
comment but never heard back, so we turned to the data.
Trump’s claim didn’t stand up to scrutiny.
In raw numbers, the United States has tested more
people than any other individual country — but
nowhere near more than "every country combined" or,
as he said in his tweet, more than "all major countries
combined.”[…] The United States has a far bigger
population than many of the "major countries" Trump often
mentions. So it could have run far more tests but still
have a much larger burden ahead than do nations like
Germany, France or Canada.[…]
Joint Model
Veracity Label
Justification/
Explanation
Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein (2020). Generating Fact Checking
Explanations. In ACL. https://www.aclweb.org/anthology/2020.acl-main.656/
Generating Free Text Explanations:
(Manual) Evaluation Measures
• Explanation Quality
• Coverage. The explanation contains important, salient information
and does not miss any points important to the fact check.
• Non-redundancy. The explanation does not contain any information
that is redundant/repeated/not relevant to the claim or the fact
check.
• Non-contradiction. The explanation does not contain any pieces of
information contradictory to the claim or the fact check.
• Overall. Overall explanation quality.
• Explanation Informativeness. Veracity label for a claim
provided based on the automatically generated
explanations only.
Wrap-Up
31
• Why explainability?
• understanding if a model is right for the right reasons
• Generated explanations can help users understand:
• inner workings of a model (model understanding)
• how a model arrived at a prediction (decision understanding)
• Explainability can enable:
• human-in-the-loop model development
• human-in-the-loop data selection
Overall Take-Aways
• Caveats:
• There can be more than one correct explanation
• Different explainability methods provide different explanations
• Most work on simple (image/text) classification tasks
• Different streams of explainability methods have different
benefits and downsides
• Perform well w.r.t. different properties
• Requirements for annotated training data
• Joint vs. post-hoc explanation generation
• Black box vs. white box
• Hypothesis testing vs. bottom-up understanding
• One-time analysis vs. continuous monitoring
Overall Take-Aways
Where to from here?
• Making explanations useful to model trainers and
end users
• Understanding how to interpret different explanations
• Combination of model and decision understanding
• Different explanation granularity for different users (e.g. content
moderator vs. online platform user)
• Some research on generating explanations, relatively little
work on understanding in what context they are useful
• (Automatically) evaluating explanations
• Human-in-the-loop development
Thank you!
isabelleaugenstein.github.io
augenstein@di.ku.dk
@IAugenstein
github.com/isabelleaugenstein
18/03/2021 35
Thanks to my PhD students and colleagues!
36
CopeNLU
https://copenlu.github.io/
Pepa Atanasova Dustin Wright Jakob Grue Simonsen Christina Lioma
Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, Alexander M. Rush.
LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent
Neural Networks. In Proceedings of IEEE TVCG 2018.
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal.
FEVER: a large-scale dataset for Fact Extraction and VERification. In
Proceedings. of NAACL 2018.
Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom. e-
SNLI: Natural Language Inference with Natural Language Explanations. In
Proceedings of NeurIPS 2018.
Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein. A
Diagnostic Study of Explainability Techniques for Text Classification. In
Proceedings of EMNLP 2020.
Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein.
Generating Fact Checking Explanations. In Proceedings of ACL 2020.
Referenced Papers
Pepa Atanasova*, Dustin Wright*, Isabelle Augenstein. Generating Label
Cohesive and Well-Formed Adversarial Claims. EMNLP 2020.
Nils Rethmeier, Vageesh Kumar Saxena, Isabelle Augenstein. TX-Ray:
Quantifying and Explaining Model-Knowledge Transfer in (Un)Supervised NLP.
In Proceedings of the Conference on UAI 2020.
Wojciech Ostrowski, Arnav Arora, Pepa Atanasova, Isabelle Augenstein. Multi-
Hop Fact Checking of Political Claims. Preprint, 2020.
Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima,
Casper Hansen, Christian Hansen and Jakob Grue Simonsen. MultiFC: A Real-
World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. In
Proceedings of EMNLP 2019.
Liesbeth Allein, Isabelle Augenstein, Marie-Francine Moens. Time-Aware
Evidence Ranking for Fact-Checking. Preprint, 2020.
*equal contributions
Other Relevant Papers

More Related Content

What's hot

Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersIntro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Daniel Zivkovic
 

What's hot (20)

Data Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data Intelligence
 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
 
Making Testing Easy w GitHub Copilot.pdf
Making Testing Easy w GitHub Copilot.pdfMaking Testing Easy w GitHub Copilot.pdf
Making Testing Easy w GitHub Copilot.pdf
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flow
 
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning

 
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersIntro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
 
Innnovations in online teaching and learning: CHatGPT and other artificial as...
Innnovations in online teaching and learning: CHatGPT and other artificial as...Innnovations in online teaching and learning: CHatGPT and other artificial as...
Innnovations in online teaching and learning: CHatGPT and other artificial as...
 
Historias de Ciencia de Datos desde la Trinchera
Historias de Ciencia de Datos desde la TrincheraHistorias de Ciencia de Datos desde la Trinchera
Historias de Ciencia de Datos desde la Trinchera
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
 
Azure ML Studio
Azure ML StudioAzure ML Studio
Azure ML Studio
 
pwc-data-mesh.pdf
pwc-data-mesh.pdfpwc-data-mesh.pdf
pwc-data-mesh.pdf
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Foundry technical intro
Foundry technical introFoundry technical intro
Foundry technical intro
 
Alteryx Desktop Designer Overview
Alteryx Desktop Designer OverviewAlteryx Desktop Designer Overview
Alteryx Desktop Designer Overview
 
Conversational AI and Chatbot Integrations
Conversational AI and Chatbot IntegrationsConversational AI and Chatbot Integrations
Conversational AI and Chatbot Integrations
 

Similar to Towards Explainable Fact Checking (DIKU Business Club presentation)

Graphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxGraphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docx
whittemorelucilla
 
AI and Social Justice: From Avoiding Harms to Positive Action
AI and Social Justice: From Avoiding Harms to Positive ActionAI and Social Justice: From Avoiding Harms to Positive Action
AI and Social Justice: From Avoiding Harms to Positive Action
Alan Dix
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
butest
 
Статистика для журналистов
Статистика для журналистовСтатистика для журналистов
Статистика для журналистов
Nataly Nikitina
 

Similar to Towards Explainable Fact Checking (DIKU Business Club presentation) (20)

Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
 
[Slides] What Do We Do with All This Big Data by Altimeter Group
[Slides] What Do We Do with All This Big Data by Altimeter Group[Slides] What Do We Do with All This Big Data by Altimeter Group
[Slides] What Do We Do with All This Big Data by Altimeter Group
 
Graphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docxGraphic Representation Grading GuideCOMTM541 Version 22.docx
Graphic Representation Grading GuideCOMTM541 Version 22.docx
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19
 
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
 
Essay Writing Online. . Step-By-Step Guide to Essay Writing - ESL Buzz
Essay Writing Online. . Step-By-Step Guide to Essay Writing - ESL BuzzEssay Writing Online. . Step-By-Step Guide to Essay Writing - ESL Buzz
Essay Writing Online. . Step-By-Step Guide to Essay Writing - ESL Buzz
 
Suggested Annotative Bibliography Essay
Suggested Annotative Bibliography EssaySuggested Annotative Bibliography Essay
Suggested Annotative Bibliography Essay
 
How To Write A Good Paragraph 1 Paragraph Wr
How To Write A Good Paragraph 1 Paragraph WrHow To Write A Good Paragraph 1 Paragraph Wr
How To Write A Good Paragraph 1 Paragraph Wr
 
AI and Social Justice: From Avoiding Harms to Positive Action
AI and Social Justice: From Avoiding Harms to Positive ActionAI and Social Justice: From Avoiding Harms to Positive Action
AI and Social Justice: From Avoiding Harms to Positive Action
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
 
Owl Research - Fact Books Owl Theme Classroom, Ow
Owl Research - Fact Books Owl Theme Classroom, OwOwl Research - Fact Books Owl Theme Classroom, Ow
Owl Research - Fact Books Owl Theme Classroom, Ow
 
Storyfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesStoryfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to Stories
 
Social Media in Pharma Summit 2011: Drug Safety
Social Media in Pharma Summit 2011: Drug SafetySocial Media in Pharma Summit 2011: Drug Safety
Social Media in Pharma Summit 2011: Drug Safety
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Avoiding Machine Learning Pitfalls 2-10-18
Avoiding Machine Learning Pitfalls 2-10-18Avoiding Machine Learning Pitfalls 2-10-18
Avoiding Machine Learning Pitfalls 2-10-18
 
Статистика для журналистов
Статистика для журналистовСтатистика для журналистов
Статистика для журналистов
 
The age of analytics
The age of analyticsThe age of analytics
The age of analytics
 

More from Isabelle Augenstein

Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
Isabelle Augenstein
 

More from Isabelle Augenstein (20)

Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information Online
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checking
 
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation Learning
 
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Lodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextLodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured Text
 
Relation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionRelation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant Supervision
 
Natural Language Processing for the Semantic Web
Natural Language Processing for the Semantic WebNatural Language Processing for the Semantic Web
Natural Language Processing for the Semantic Web
 
Seed Selection for Distantly Supervised Web-Based Relation Extraction
Seed Selection for Distantly Supervised Web-Based Relation ExtractionSeed Selection for Distantly Supervised Web-Based Relation Extraction
Seed Selection for Distantly Supervised Web-Based Relation Extraction
 
Mapping Keywords to
Mapping Keywords to Mapping Keywords to
Mapping Keywords to
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Towards Explainable Fact Checking (DIKU Business Club presentation)

  • 1. Towards Explainable Fact Checking Isabelle Augenstein augenstein@di.ku.dk @IAugenstein http://isabelleaugenstein.github.io/ DIKU Business Club 18 March 2021
  • 2. Overview of Today’s Talk • Fact checking – what is it and why do we need it? • False information online • Content-based automatic fact checking • Explainability – what is it and why do we need it? • Making the right predictions for the right reasons • Model training pipeline • Explainable fact checking – some first solutions • Rationale selection • Generating free-text explanations • Wrap-up
  • 4. Types of False Information https://ec.europa.eu/info/live-work-travel-eu/coronavirus-response/fighting-disinformation/tackling-coronavirus-disinformation_en
  • 5. Types of False Information
  • 6. Types of False Information http://www.contentrow.com/tools/link-bait-title-generator
  • 7. Types of False Information https://rokokoposten.dk/2021/02/23/mars-indfoerer-indrejseforbud-efter-fund-af-britisk-virusvariant/
  • 8. Types of False Information
  • 9. Types of False Information • Disinformation: • Intentionally false, spread deliberately • Misinformation: • Unintentionally false information • Clickbait: • Exaggerating information and under-delivering it • Satire: • Intentionally false for humorous purposes • Biased Reporting: • Reporting only some of the facts to serve an agenda
  • 10. Types of False Information • Disinformation: • Intentionally false, spread deliberately • Misinformation: • Unintentionally false information • Clickbait: • Exaggerating information and under-delivering it • Satire: • Intentionally false for humorous purposes • Biased Reporting: • Reporting only some of the facts to serve an agenda
  • 11. Full Fact Checking Pipeline Claim Check- Worthiness Detection Evidence Document Retrieval Stance Detection / Textual Entailment Veracity Prediction “Immigrants are a drain on the economy” not check-worthy check-worthy “Immigrants are a drain on the economy” “Immigrants are a drain on the economy”, “EU immigrants have a positive impact on public finances” positive negative neutral true false not enough info “Immigrants are a drain on the economy”
  • 13. Explainability – what is it and why do we need it? Architect Trainer End User Terminology borrowed from Strobelt et al. (2018), “LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE transactions on visualization and computer graphics.” true
  • 14. Explainability – what is it and why do we need it? Claim: “In the COVID-19 crisis, ‘only 20% of African Americans had jobs where they could work from home.’” Evidence: “20% of black workers said they could work from home in their primary job, compared to 30% of white workers.” Claim: “Children don’t seem to be getting this virus.” Evidence: “There have been no reported incidents of infection in children.” Claim: “Taylor Swift had a fatal car accident.” Reason: overfitting to spurious patterns (celebrity death hoaxes are common) Claim: “Michael Jackson is still alive, appears in daughter’s selfie.” Reason: overfitting to spurious patterns (celebrity death hoaxes are common) Right prediction Wrong prediction Right reasons Wrong reasons
  • 15. Explainability – what is it and why do we need it? Architect Trainer End User ? true
  • 16. Explainability – what is it and why do we need it? Architect Trainer End User ? true
  • 17. Explainability – what is it and why do we need it? Architect Trainer End User ? true
  • 19. Explanation via Rationale Selection -- Sentences Claim: The Faroe Islands are no longer part of a kingdom. Evidence document: The Faroe Islands, also called the Faeroes, is an archipelago between the Norwegian Sea and the North Atlantic, about halfway between Norway and Iceland, 200 mi north-northwest of Scotland. The Islands are an autonomous country within the Kingdom of Denmark. Its area is about 1,400 km2 with a population of 50,030 in April 2017. (…) Label: refutes James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal (2018). FEVER: a large-scale dataset for Fact Extraction and VERification. In Proc. of NAACL. https://arxiv.org/abs/1803.05355
  • 20. Explanation via Rationale Selection -- Sentences Claim: The Faroe Islands are no longer part of a kingdom. Evidence document: The Faroe Islands, also called the Faeroes, is an archipelago between the Norwegian Sea and the North Atlantic, about halfway between Norway and Iceland, 200 mi north-northwest of Scotland. The Islands are an autonomous country within the Kingdom of Denmark. Its area is about 1,400 km2 with a population of 50,030 in April 2017. (…) Label: refutes James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal (2018). FEVER: a large-scale dataset for Fact Extraction and VERification. In Proc. of NAACL. https://arxiv.org/abs/1803.05355
  • 21. Explanation via Rationale Selection -- Words Hypothesis: An adult dressed in black holds a stick. Premise: An adult is walking away empty-handedly. Label: contradiction Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom (2018). e-SNLI: Natural Language Inference with Natural Language Explanations. In Proc. of NeurIPS. https://proceedings.neurips.cc/paper/2018/hash/4c7a167bb329bd92580a99ce422d6fa6-Abstract.html
  • 22. Explanation via Rationale Selection -- Words Hypothesis: An adult dressed in black holds a stick. Premise: An adult is walking away empty-handedly. Label: contradiction Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom (2018). e-SNLI: Natural Language Inference with Natural Language Explanations. In Proc. of NeurIPS. https://proceedings.neurips.cc/paper/2018/hash/4c7a167bb329bd92580a99ce422d6fa6-Abstract.html
  • 23. Post-Hoc Explainability Methods via Rationale Selection Example from Twitter Sentiment Extraction (TSE) dataset Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein (2020). A Diagnostic Study of Explainability Techniques for Text Classification. In EMNLP. https://www.aclweb.org/anthology/2020.emnlp-main.263/
  • 24. • Agreement with Human Rationales (HA) • Degree of overlap between human and automatic saliency scores • Confidence Indication (CI) • Predictive power of produced explanations for model’s confidence • Faithfulness (F) • Mask most salient tokens, measure drop in performance • Rationale Consistency (RC) • Difference between explanations for models trained with different random seeds, with model with random weights • Dataset Consistency (DC) • Difference between explanations for similar instances Post-Hoc Explainability Methods for via Rationale Selection: Diagnostic Properties
  • 25. Post-Hoc Explainability Methods via Rationale Selection: Selected Results Spider chart for Transformer model on e-SNLI HA: Agreement with human rationales CI: Confidence indication F: Faithfulness RC: Rationale Consistency DC: Dataset Consistency
  • 26. Post-Hoc Explainability Methods via Rationale Selection: Aggregated Results 0,4 0,45 0,5 0,55 0,6 0,65 0,7 Random ShapSam pl LIM E O cclusion Saliency_u Saliency_l2 InputXGrad_u InputXGrad_l2 GuidedBP_u GuidedBP_l2 Mean of diagnostic property measures for e-SNLI Transformer CNN LSTM
  • 27. Generating Free Text Explanations https://www.politifact.com/factchecks/2020/apr/30/donald-trump/trumps-boasts-testing-spring-cherry-picking-data/
  • 29. Generating Free Text Explanations Claim: We’ve tested more than every country combined. Ruling Comments: Responding to weeks of criticism over his administration’s COVID-19 response, President Donald Trump claimed at a White House briefing that the United States has well surpassed other countries in testing people for the virus. "We’ve tested more than every country combined," Trump said April 27 […] We emailed the White House for comment but never heard back, so we turned to the data. Trump’s claim didn’t stand up to scrutiny. In raw numbers, the United States has tested more people than any other individual country — but nowhere near more than "every country combined" or, as he said in his tweet, more than "all major countries combined.”[…] The United States has a far bigger population than many of the "major countries" Trump often mentions. So it could have run far more tests but still have a much larger burden ahead than do nations like Germany, France or Canada.[…] Joint Model Veracity Label Justification/ Explanation Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein (2020). Generating Fact Checking Explanations. In ACL. https://www.aclweb.org/anthology/2020.acl-main.656/
  • 30. Generating Free Text Explanations: (Manual) Evaluation Measures • Explanation Quality • Coverage. The explanation contains important, salient information and does not miss any points important to the fact check. • Non-redundancy. The explanation does not contain any information that is redundant/repeated/not relevant to the claim or the fact check. • Non-contradiction. The explanation does not contain any pieces of information contradictory to the claim or the fact check. • Overall. Overall explanation quality. • Explanation Informativeness. Veracity label for a claim provided based on the automatically generated explanations only.
  • 32. • Why explainability? • understanding if a model is right for the right reasons • Generated explanations can help users understand: • inner workings of a model (model understanding) • how a model arrived at a prediction (decision understanding) • Explainability can enable: • human-in-the-loop model development • human-in-the-loop data selection Overall Take-Aways
  • 33. • Caveats: • There can be more than one correct explanation • Different explainability methods provide different explanations • Most work on simple (image/text) classification tasks • Different streams of explainability methods have different benefits and downsides • Perform well w.r.t. different properties • Requirements for annotated training data • Joint vs. post-hoc explanation generation • Black box vs. white box • Hypothesis testing vs. bottom-up understanding • One-time analysis vs. continuous monitoring Overall Take-Aways
  • 34. Where to from here? • Making explanations useful to model trainers and end users • Understanding how to interpret different explanations • Combination of model and decision understanding • Different explanation granularity for different users (e.g. content moderator vs. online platform user) • Some research on generating explanations, relatively little work on understanding in what context they are useful • (Automatically) evaluating explanations • Human-in-the-loop development
  • 36. Thanks to my PhD students and colleagues! 36 CopeNLU https://copenlu.github.io/ Pepa Atanasova Dustin Wright Jakob Grue Simonsen Christina Lioma
  • 37. Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, Alexander M. Rush. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks. In Proceedings of IEEE TVCG 2018. James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal. FEVER: a large-scale dataset for Fact Extraction and VERification. In Proceedings. of NAACL 2018. Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom. e- SNLI: Natural Language Inference with Natural Language Explanations. In Proceedings of NeurIPS 2018. Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein. A Diagnostic Study of Explainability Techniques for Text Classification. In Proceedings of EMNLP 2020. Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein. Generating Fact Checking Explanations. In Proceedings of ACL 2020. Referenced Papers
  • 38. Pepa Atanasova*, Dustin Wright*, Isabelle Augenstein. Generating Label Cohesive and Well-Formed Adversarial Claims. EMNLP 2020. Nils Rethmeier, Vageesh Kumar Saxena, Isabelle Augenstein. TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un)Supervised NLP. In Proceedings of the Conference on UAI 2020. Wojciech Ostrowski, Arnav Arora, Pepa Atanasova, Isabelle Augenstein. Multi- Hop Fact Checking of Political Claims. Preprint, 2020. Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen and Jakob Grue Simonsen. MultiFC: A Real- World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. In Proceedings of EMNLP 2019. Liesbeth Allein, Isabelle Augenstein, Marie-Francine Moens. Time-Aware Evidence Ranking for Fact-Checking. Preprint, 2020. *equal contributions Other Relevant Papers