He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

•

0 j'aime•173 vues

Yves Peirsman presents several instances where bias has posed a risk to the successful adoption of NLP systems, and discusses what techniques exist to discover these biases before the systems are put in production.

Technologie

Finding and Fixing Bias
in Natural Language Processing
Yves Peirsman

Artificial Intelligence
Natural Language Processing
A primer in NLP
Machine
translation
Sentiment
analysis
Information
retrieval
Information
extraction
Text
classification

We provide consultancy
for companies that need
guidance in the NLP domain
We develop software
and train custom NLP
models for challenging
or domain-specific
applications.

Training data Training process Model
We integrate
models with
workflows.
NLP Town
We help annotate
training data.
We train models
for NLP
applications.
We provide consultancy
for NLP projects.

A primer in NLP
Training data Training process Model

Word Embeddings
Word embeddings allow NLP models to generalize better.

Word Embeddings
Word embeddings capture both general and linguistic knowledge.

Word Embeddings
Word embeddings also encode bias:
● Man is to king as woman is to ___.
● Man is to programmer as woman is to ___.
Experiment:
● Measure the similarity between occupations and
○ A set of “male” words: man, son, father, he, him, etc.
○ A set of “female” words: woman, daughter, mother, she, her, etc.

Pretrained NLP models
Pretrained language models are a recent significant breakthrough in NLP:
● Language models predict masked words.
● They learn a lot about language.
● This knowledge can be reused in “downstream” tasks.
This movie won her an Oscar for best actress.
The keys to the house are on the table.

Pretrained NLP models
ULMFit, Howard and Ruder 2018

Pretrained language models
Experiment: association with a large number
of positive adjectives
● One of the several recent Dutch Bert
models
● Association between 240 positive
adjectives and hij/zij:
○ aantrekkelijk, ambitieus, intelligent,
slim, knap, nauwkeurig,
nieuwsgierig, etc.

Step 1: Identify bias with explainable AI
Challenge
● First we need to find out our models are biased: search for known, but also
unexpected bias
● An important role for explainable AI
Experiment
● A simple classifier for toxic comments
● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a
hole go to hell!"

Step 1: Identify bias with explainable AI
● Visualize the classifier features and their weights:

Step 1: Identify bias with explainable AI

Step 2: Fixing and avoiding bias
Training data Training process Model

Training data Training process Model
Ensure the training
data is free of bias.
Step 2: Fixing and avoiding bias

Bias in annotation
Inform annotators about possible confounding factors, such as dialect.
● Example: if people are informed that a tweet contains African American
English dialect, they are less likely to label it as offensive (Sap et al. 2019)
Bias in text
● If you create a new corpus, ensure your texts contain as little bias as
possible.
● If you use existing data, try mitigating biases through data
augmentation, over- and/or undersampling, etc.
Step 2: Fixing and avoiding bias

Training data Training process Model
Pick a training
procedure that
makes the system
blind to bias.
Step 2: Fixing and avoiding bias

Adversarial training
Train your model to shine at your task, but to fail at
predicting “protected variables”, such as gender or race.
ModelCV
Step 2: Fixing and avoiding bias

Training data Training process Model
Change the
weights of the
model so that the
bias is reduced.
Step 2: Fixing and avoiding bias

Word embeddings
Transform the embeddings so that bias is removed.
Pre-trained models
Fine-tune on non-biased data, so that the models “forget” their bias.
Step 2: Fixing and avoiding bias

None of these methods are foolproof:
● You need to be aware of the bias before you can remove it
● Often only “superficial” bias is removed, but deeper bias remains (Honen
and Goldberg 2019)
As AI developers, it is our responsibility to deploy our system in such a way that
potentially harmful side effects are minimized.
● Effective feedback loops
● Human-in-the-loop AI
Step 2: Fixing and avoiding bias

http://www.nlp.town yves@nlp.town
Thanks! Questions?

Recommandé

Explainability for Natural Language ProcessingYunyao Li

How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfLora Aroyo

AI in Software Development.pptxGenic Solutions

An Introduction to Generative AICori Faklaris

Using Generative AIMark DeLoura

Using AI chatbots for deep learning and teaching with specific examples to en...Nigel Daly

10 Key Considerations for AI/ML Model GovernanceQuantUniversity

Recommandé

Explainability for Natural Language ProcessingYunyao Li

How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfLora Aroyo

AI in Software Development.pptxGenic Solutions

An Introduction to Generative AICori Faklaris

Using Generative AIMark DeLoura

Using AI chatbots for deep learning and teaching with specific examples to en...Nigel Daly

10 Key Considerations for AI/ML Model GovernanceQuantUniversity

Chat GPT TEL Community of PracticePeter Windle

A Survey of ‘Bias’ in Natural Language Processing Systemssubarna89

Sentiment Analysisalmenea

Generative AI Risks & ConcernsAjitesh Kumar

Intro to nlpRutu Mulkar-Mehta

Learn Prompting with ChatGPTNikhil Gadkar

Fairness and Ethics in ADaniel Chan

ChatGPT, Foundation Models and Web3.pptxJesus Rodriguez

Bias in AI-systems: A multi-step approachEirini Ntoutsi

Meaning Representations for Natural Languages: Design, Models and ApplicationsYunyao Li

Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati

Natural language processing (nlp)Kuppusamy P

Karl Mehta Masterclass on AI.pdfKarl Mehta

Artificial Intelligence and BiasOleksandr Krakovetskyi

NLP ApplicationsRepustate

CHATGPT.pptxSajedRahman2

Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Sergey Karayev

RFM analýzaTaste Medio

Generative Models and ChatGPTLoic Merckel

ChatGPT Use- Cases Bluechip Technologies

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman

Openbar Leuven // Less is more. Working with less data in NLP by Yves PeirsmanOpenbar

Contenu connexe

Tendances

Chat GPT TEL Community of PracticePeter Windle

A Survey of ‘Bias’ in Natural Language Processing Systemssubarna89

Sentiment Analysisalmenea

Generative AI Risks & ConcernsAjitesh Kumar

Intro to nlpRutu Mulkar-Mehta

Learn Prompting with ChatGPTNikhil Gadkar

Fairness and Ethics in ADaniel Chan

ChatGPT, Foundation Models and Web3.pptxJesus Rodriguez

Bias in AI-systems: A multi-step approachEirini Ntoutsi

Meaning Representations for Natural Languages: Design, Models and ApplicationsYunyao Li

Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati

Natural language processing (nlp)Kuppusamy P

Karl Mehta Masterclass on AI.pdfKarl Mehta

Artificial Intelligence and BiasOleksandr Krakovetskyi

NLP ApplicationsRepustate

CHATGPT.pptxSajedRahman2

Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Sergey Karayev

RFM analýzaTaste Medio

Generative Models and ChatGPTLoic Merckel

ChatGPT Use- Cases Bluechip Technologies

Tendances (20)

Chat GPT TEL Community of Practice

A Survey of ‘Bias’ in Natural Language Processing Systems

Sentiment Analysis

Generative AI Risks & Concerns

Intro to nlp

Learn Prompting with ChatGPT

Fairness and Ethics in A

ChatGPT, Foundation Models and Web3.pptx

Bias in AI-systems: A multi-step approach

Meaning Representations for Natural Languages: Design, Models and Applications

Building, Evaluating, and Optimizing your RAG App for Production

Natural language processing (nlp)

Karl Mehta Masterclass on AI.pdf

Artificial Intelligence and Bias

NLP Applications

CHATGPT.pptx

Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)

RFM analýza

Generative Models and ChatGPT

ChatGPT Use- Cases

Similaire à He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman

Openbar Leuven // Less is more. Working with less data in NLP by Yves PeirsmanOpenbar

Reflective Plan ExamplesMonica Turner

What can Natural Language Processing do for you?Yves Peirsman

DataScientist Job : Between Myths and Reality.pdfJedha Bootcamp

ConveyUX Elegant Precisionlaurentgc

Fine-tuning Pre-Trained Models for Generative AI ApplicationsBenjaminlapid1

Clark ch 8 and 9Christian King

How to fine-tune and develop your own large language model.pptxKnoldus Inc.

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays

Clark ch 8 and 9Christian King

ChatGPT in academic settings H2.deDavid Döring

Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docxcroysierkathey

Babak Rasolzadeh: The importance of entitiesZoltan Varju

Ai demystified for HR and TA leadersAntonia Macrides

E-Learning Balancing Act: Good vs Efficient development-web_version092010tmharpster

Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...IL Group (CILIP Information Literacy Group)

[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...Pedro Henriques

Ask Not What AI Can Do For You - Nov 2023 - Slideshare.pptxD2L Barry

Similaire à He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town (20)

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup

Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman

Reflective Plan Examples

What can Natural Language Processing do for you?

DataScientist Job : Between Myths and Reality.pdf

ConveyUX Elegant Precision

Fine-tuning Pre-Trained Models for Generative AI Applications

Clark ch 8 and 9

How to fine-tune and develop your own large language model.pptx

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"

Clark ch 8 and 9

ChatGPT in academic settings H2.de

Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx

Babak Rasolzadeh: The importance of entities

Ai demystified for HR and TA leaders

E-Learning Balancing Act: Good vs Efficient development-web_version092010

Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...

[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...

Ask Not What AI Can Do For You - Nov 2023 - Slideshare.pptx

Plus de Patrick Van Renterghem

Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...Patrick Van Renterghem

Implementing error-proof, business-critical Machine Learning, presentation by...Patrick Van Renterghem

Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...Patrick Van Renterghem

AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...Patrick Van Renterghem

Responsible AI: An Example AI Development Process with Focus on Risks and Con...Patrick Van Renterghem

Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...Patrick Van Renterghem

How obedient digital twins and intelligent beings contribute to ethics and ex...Patrick Van Renterghem

Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...Patrick Van Renterghem

Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...Patrick Van Renterghem

Digital Workplace Case Study: How the Municipality of Duffel successfully swi...Patrick Van Renterghem

Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...Patrick Van Renterghem

The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...Patrick Van Renterghem

Engie's Digital Workplace and "Connecting the company" business case, present...Patrick Van Renterghem

Face your communication challenges when implementing a digital workplace, bas...Patrick Van Renterghem

The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...Patrick Van Renterghem

Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...Patrick Van Renterghem

Tim scottkoenverheyenpresentationPatrick Van Renterghem

Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...Patrick Van Renterghem

Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...Patrick Van Renterghem

Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...Patrick Van Renterghem

Plus de Patrick Van Renterghem (20)

Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...

Implementing error-proof, business-critical Machine Learning, presentation by...

Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...

AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...

Responsible AI: An Example AI Development Process with Focus on Risks and Con...

Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...

How obedient digital twins and intelligent beings contribute to ethics and ex...

Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...

Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...

Digital Workplace Case Study: How the Municipality of Duffel successfully swi...

Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...

The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...

Engie's Digital Workplace and "Connecting the company" business case, present...

Face your communication challenges when implementing a digital workplace, bas...

The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...

Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...

Tim scottkoenverheyenpresentation

Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...

Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...

Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...

Dernier

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Slack Application Development 101 Slidespraypatel2

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

How to convert PDF to text with Nanonetsnaman860154

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Dernier (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Slack Application Development 101 Slides

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

How to convert PDF to text with Nanonets

Handwritten Text Recognition for manuscripts and early printed texts

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

How to Troubleshoot Apps for the Modern Connected Worker

08448380779 Call Girls In Civil Lines Women Seeking Men

The Codex of Business Writing Software for Real-World Solutions 2.pptx

[2024]Digital Global Overview Report 2024 Meltwater.pdf

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Partners Life - Insurer Innovation Award 2024

Presentation on how to chat with PDF using ChatGPT code interpreter

Driving Behavioral Change for Information Management through Data-Driven Gree...

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Data Cloud, More than a CDP by Matt Robison

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Automating Google Workspace (GWS) & more with Apps Script

He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

1. Finding and Fixing Bias in Natural Language Processing Yves Peirsman

2. Artificial Intelligence Natural Language Processing A primer in NLP Machine translation Sentiment analysis Information retrieval Information extraction Text classification

3. We provide consultancy for companies that need guidance in the NLP domain We develop software and train custom NLP models for challenging or domain-specific applications.

4. Training data Training process Model We integrate models with workflows. NLP Town We help annotate training data. We train models for NLP applications. We provide consultancy for NLP projects.

5. Bias in Natural Language Processing

6. Bias in Natural Language Processing

7. A primer in NLP Training data Training process Model

8. A primer in NLP

9. Word Embeddings Word embeddings allow NLP models to generalize better.

10. Word Embeddings Word embeddings capture both general and linguistic knowledge.

11. Word Embeddings Word embeddings also encode bias: ● Man is to king as woman is to ___. ● Man is to programmer as woman is to ___. Experiment: ● Measure the similarity between occupations and ○ A set of “male” words: man, son, father, he, him, etc. ○ A set of “female” words: woman, daughter, mother, she, her, etc.

12. Word Embeddings

13. Pretrained NLP models Pretrained language models are a recent significant breakthrough in NLP: ● Language models predict masked words. ● They learn a lot about language. ● This knowledge can be reused in “downstream” tasks. This movie won her an Oscar for best actress. The keys to the house are on the table.

14. Pretrained NLP models ULMFit, Howard and Ruder 2018

15. Pretrained language models Experiment: association with a large number of positive adjectives ● One of the several recent Dutch Bert models ● Association between 240 positive adjectives and hij/zij: ○ aantrekkelijk, ambitieus, intelligent, slim, knap, nauwkeurig, nieuwsgierig, etc.

16. The problem with bias or

17. Step 1: Identify bias with explainable AI Challenge ● First we need to find out our models are biased: search for known, but also unexpected bias ● An important role for explainable AI Experiment ● A simple classifier for toxic comments ● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a hole go to hell!"

18. Step 1: Identify bias with explainable AI ● Visualize the classifier features and their weights:

19. Step 1: Identify bias with explainable AI

20. Step 1: Identify bias with explainable AI

21. Step 2: Fixing and avoiding bias Training data Training process Model

22. Training data Training process Model Ensure the training data is free of bias. Step 2: Fixing and avoiding bias

23. Bias in annotation Inform annotators about possible confounding factors, such as dialect. ● Example: if people are informed that a tweet contains African American English dialect, they are less likely to label it as offensive (Sap et al. 2019) Bias in text ● If you create a new corpus, ensure your texts contain as little bias as possible. ● If you use existing data, try mitigating biases through data augmentation, over- and/or undersampling, etc. Step 2: Fixing and avoiding bias

24. Training data Training process Model Pick a training procedure that makes the system blind to bias. Step 2: Fixing and avoiding bias

25. Adversarial training Train your model to shine at your task, but to fail at predicting “protected variables”, such as gender or race. ModelCV Step 2: Fixing and avoiding bias

26. Training data Training process Model Change the weights of the model so that the bias is reduced. Step 2: Fixing and avoiding bias

27. Word embeddings Transform the embeddings so that bias is removed. Pre-trained models Fine-tune on non-biased data, so that the models “forget” their bias. Step 2: Fixing and avoiding bias

28. None of these methods are foolproof: ● You need to be aware of the bias before you can remove it ● Often only “superficial” bias is removed, but deeper bias remains (Honen and Goldberg 2019) As AI developers, it is our responsibility to deploy our system in such a way that potentially harmful side effects are minimized. ● Effective feedback loops ● Human-in-the-loop AI Step 2: Fixing and avoiding bias

29. http://www.nlp.town yves@nlp.town Thanks! Questions?