Fairness in AI - Towards more Ethical predictive models - Big Data Expo 2019

Fairness in AI
“Facilis descensusAverno”
Henk Griffioen - 2019-05-07

Data Scientists have to put in the work
so that society does not end up in ML hell
2
“The gates of hell are open night and day;
Smooth the descent,and easy is the way:
But to return, and view the cheerfulskies,
In this the task and mighty labor lies.”
The Works of Virgil (John Dryden)

About me:
one of those Data Scientists who should put in the work
3
GoDataDriven
Driving Your Success With
Data and AI
Henk Griffioen
Lead Data Scientist
@

The impact of AI on society is not all good. AI can encode and
amplify human biases, leading to unfair outcomes at scale

Fairness is a hot topic and gaining traction!
https://fairmlclass.github.io/

Let’s work through an example
6

Predict if income exceeds $50K/year from 1994 census
data with sensitive attributes

Getting a predictive model is easy!
But is it fair?

Ratio of
• probability of a positive outcome given the sensitive attribute being true;
• probability of a positive outcome given the sensitive attribute being false;
is no less than p:100
p%-rule: measure demographic parity
40% : 50% = 80%10% : 50% = 20%

Our model is unfair: low probability of high income for
black people and women

How are we getting bias in our systems?
12

• Skewed sample initial bias that compoundsover time
• Tainted examples bias in the data caused by humans
• Sample size disparity minority groups not as well represented
• Limited features less informative data collected on minority groups
• Proxies data implicitly encoding sensitive attributes
• …
Many reasons why bias is creeping into our systems
Barocas & Selbst, 2016

Intermezzo:
Dutch tax authorities in the news
14

Ethnic profiling by the
Dutch tax authorities
15
Profiling people for fraud
“A daycare center in Almere sounded the alarm when
only non-Dutch parents were confronted with
discontinuation of childcare allowances…
…The Tax and Customs Administration says that it
uses the data on Dutch nationality or non-Dutch
nationality in the so-called automatic risk selection
for fraud.”
https://www.nrc.nl/nieuws/2019/05/20/autoriteit
-persoonsgegevens-onderzoekt-mogelijke-
discriminatie-door-belastingdienst-a3960840

Is it enough to leave
out data on (second)
nationality?
16
…In a response, the Tax and Customs Administration
states that the information about the (second)
nationality of parents or intermediary is not used in this
investigation…
“Since 2014, a second nationality with Dutch nationality
is no longer included in the basic registration. This has
been introduced to prevent discrimination for people
with dual nationality.”
Is this enough to
assure that non-Dutch
parents in Almere will
not suffer another tax
injustice?
Towards a fair future?

The model is still unfair
without sensitive data.
Biases are still encoded
by proxies in the
dataset!
17

Start out with our normal (biased) classifier…
Louppe, 2017

… and add an adversarial classifier as a fairness referee
Louppe, 2017

Warming up: train classifier to predict income

Warming up: train adversarial to detect unfairness
Louppe, 2017

Adversarial training: iteratively train classifier and
adversarial
Louppe, 2017

After enough training rounds the classifier gives fair
income predictions!
https://blog.godatadriven.com/fairness-in-ml

No mathematical formulation of fairness.
There are many (conflicting) measures
http://www.ece.ubc.ca/~mjulia/publications/Fairness_Definitions_Explained_2018.pdf

Many ML fairness approaches
https://dzone.com/articles/machine-learning-models-bias-mitigation-strategies

Fairness should be a key part of your product process
https://www.slideshare.net/KrishnaramKenthapadi/fairnessaware-machine-learning-practical-
challenges-and-lessons-learned-www-2019-tutorial
State
problem
Construct
dataset
Select
algorithm
Train
model
Test
model
Deploy
solution
Gather
feedback
Is using an algorithm ethical? Can it
be misused? Who might be harmed?
Are there biased features? Are
groups over-/underrepresented?
Should we get more data?
Is our objective function in line with
ethics? Do we need separate models
for minority populations?
Do we need to enforce fairness?
What metrics should we track?
What fairness metrics? Do the
metrics capture consumer needs?
Are we deploying on a population not
capture in the dataset?
Does the solution enforce unfair
feedback loops? Is intervention
needed?

Fairness is far from being solved and needs active
work!

Data Scientists have to put in the work
so that society does not end up in ML hell
“The gates of hell are open night and day;
Smooth the descent,and easy is the way:
But to return, and view the cheerfulskies,
In this the task and mighty labor lies.”
The Works of Virgil (John Dryden)

An ethics checklist for data scientists
• http://deon.drivendata.org/
Tutorial on fairness for products
• sites.google.com/view/wsdm19-fairness-tutorial
Community concerned with fairness in ML
• www.fatml.org
Our blogs
• blog.godatadriven.com/fairness-in-ml
• blog.godatadriven.com/fairness-in-pytorch
Where to go from here?

Fairness in AI - Towards more Ethical predictive models - Big Data Expo 2019

Fairness in AI - Towards more Ethical predictive models - Big Data Expo 2019

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Fairness in AI - Towards more Ethical predictive models - Big Data Expo 2019

Similaire à Fairness in AI - Towards more Ethical predictive models - Big Data Expo 2019 (20)

Plus de webwinkelvakdag

Plus de webwinkelvakdag (20)

Dernier

Dernier (20)

Fairness in AI - Towards more Ethical predictive models - Big Data Expo 2019