Algorithmic Fairness: A Brief Introduction

Algorithmic
Fairness:
A Brief
Introduction

Talk of AI and ethics is on the rise

2017
The Issues Perpetuating Discussion

What Really Caused it
to Explode?

Pro Publica and COMPAS
• Pro Publica
• Journalism
• Social Justice
• Audited COMPAS
• Northpoint
• COMPAS
• Recidivism Prediction
• “Cutting Edge” ML
• COMPAS Audit
• Was Calibrated
• Equal Accuracy
• Was Deployed
High Risk, No Offense
• Black: 45%
• White: 24%
Low Risk, Offense
• Black: 28%
• White: 48%
Problem: Disparate Impact

So why use these systems at all?

Probabilistic vs Causal Policies
Rain Dancing Problems (End Drought)
• Key Question: Does rain dancing cause rain?
• Solution: Causal Rule (If yes, then rain dance)
• Doesn’t require prediction of anything
Umbrella Problems (Stay Dry)
• Key Question: Will it rain today?
• Solution: Probabilistic cost/benefit (many variables)
• Requires prediction of rain, and knowledge of payouts
Takeaway: Some problems require prediction
for good results. ML predicts better than
other approaches.
Kleinberg, Jon. “Prediction Policy Problems.”

How do we get the best
of both?
First, we look at the difference?

Why are ML Policies Different from Causal Policies?

How Does
ML/AI Make
Decisions?
Risky
Not Risky

A Deeper Look: Probabilistic Decision Boundaries
Basics:
• (Un)Certainty of Cases
• Risk vs Class
• Red Errors Blue Errors
Takeaways:
• Estimated Functions aren’t
like known functions
• Different results for equally
deserving people/groups
• The uncertain space is most
interesting
• Upside
• Downside
Class
Boundary
Uncertain
Space
~0 < x < ~1
Creditworthy
Not CW

What Did Researchers Do in
Response to Their Dilemma?

They
Proposed a
New
Direction
Previous
Views/Strategies
New
Views/Strategies
Source of
Bias None
Human Biases
Garbage in, Garbage out
Protected
Features Unawareness Redundant Encodings
Fairness
Measures None
Blindness, Disparate
Impact (many)
Solution None Fair Classifiers (ADMs)

The Dream Machine: Best of Both Worlds

This Dream Led to New
Research

Hypothesis 1: Garbage
In, Garbage Out
Problem: Biased Data
• Structural Bias
• Cognitive Bias
• Historical Bias
• Stored in Protected Features
Solution:
Blind the model to Protected Features

Attempts to Blind Models
Problem: Discriminatory Decisions
Goal: Race-Blindness
Original Solution: Drop Protected Features
• Problem: Redundant Encodings or Proxies
Solution 2: Eliminate proxy information
• Problem: Poor performance
Solution 3: Eliminate Proxies’ influence with penalty
• Problem: Downstream effects
Race
IncomeCity
Income City
Race
IncomeCity
Problem 1) Drop Features
2) Eliminate Proxy Information
3) Penalize Model

What Went Wrong?
“disparate treatment doctrine does not
appear to do much to regulate
discriminatory data mining”
--Solon Barocas
“The race-aware predictor dominates
the other prediction functions”
--Jon Kleinberg
Questions:
Should we still race-blind models?
Are there other reasons to eliminate sensitive information?

Hypothesis 2:
Only Outcomes
Matter, Make
them “Fair”
• Problem: Error-Rate
Imbalances (COMPAS)
• Structural Bias
• Cognitive Bias
• Historical Bias
• Black-Box Algorithms
Repeat these Errors
• Solution:
• Move the Decision
Boundary to Fix Imbalances

Re-Thresholding Decision Boundaries
Make sure groups are in
line with predetermined
“fairness metrics”
Goal:
Move decision boundary until you
get the results you want
Popular Metrics:
• Model Blinding
• Demographic Parity
• Equal Opportunity
There are more than 20 different
ones
New Class
Boundary
Uncertain
Space
~0 < x < ~1
Creditworthy
Not CW

Quick Recap From Day 1
About ML
• ML is Function Estimation
• ML Decisions are Mostly Classification Problems
• These Classifications Are Always Uncertain
About Ethics
• Compas (Pro-Publica and Northpoint)
• Hypothesis 1: Biased Data, Erase Protected Information
• Didn’t have legal teeth or Practical Effectiveness
• Hypothesis 2: Error-Rate Imbalance, Move Decision Boundary
Researchers Original Solution: Fair Classifier

Max Profit
Maximize profits:
Obvious
Total Profit: 32,400
Decision Rules:
Different
True Positive Rate:
Unequal
Percent Approved:
Unequal

Blinding
Model
Group Unaware:
Class features and all
“proxy” information
removed
Decision Rules: Same
True Positive Rate:
Unequal
Percent Approved:
Unequal

Demographic
Parity
Parity:
All groups have same
percentage approved
Decision Rules:
Different
True Positive Rate:
Unequal
Percent Approved:
Equal

Equal
Opportunity
Opportunity:
Same percentage of
“credit-worthy” candidates
Decision Rules:
Different
True Positive Rate:
Equal
Percent Approved:
Unequal

What Went Wrong? (Legally)
“smaller differences can constitute adverse
impact and greater differences may not,
depending on circumstances”
--Solon Barocas

What Went Wrong? (Conceptually)
“without precise definitions of beliefs about
the state of the world and…harms one wishes
to prevent, our results show that it is not
possible to make progress”
--Sorelle Friedler
Observed
Space
Construct
Space
Predicted
Space
GPA Success in High
School
College
Performance
Arrest Record Criminal Past Recidivism
Experience Job Knowledge Productivity

What Went Wrong? (Practically)
“since the optimal constrained algorithms differ
from the optimal unconstrained algorithm,
fairness has a cost”
--Sam Corbett-Davies

Do we need new laws to address AI Ethics?
Is it acceptable to use different rules for
different people?
What matters? The Rule, the Outcomes, Both…?
Can fairness exist outside of context?

What Happens in the
Broader Environment?

Feedback Loops (Exacerbating Feedback)
Model Trained on
Skewed Data
Poor Performance
on
Misrepresented
Class
Unobservability
Or
Over-observability
Under- and Over-
Representation
In Sample
Skewed Dataset
Example 1: Predictive Policing
• Targeting
• Arrest
• Data Skewed
• Poor Sample for Model
• Skewed Prediction
• Targeting
Over- Observation/Representation

Feedback Loops (Exacerbating Feedback)
Model Trained on
Skewed Data
Poor Performance
on
Misrepresented
Class
Unobservability
Or
Over-observability
Under- and Over-
Representation
In Sample
Skewed Dataset
Example 1: Predictive Policing
• Targeting
• Arrest
• Data Skewed
• Poor Sample for Model
• Skewed Prediction
• Targeting
Over- Observation/Representation
Example 2: Loans
• Group Traditionally Denied Loans
• Don’t Get Loans
• Can’t Pay Back
• Can’t Establish Credit
• Predicted as not “Credit-Worthy”
• Denied Loans
Under-Observation/Representation

A closer Look: Feedback and Inclusion/Exclusion
Survivorship Bias: Focus on those
that made it through selection
• Wald’s damaged planes
• Under/Over- Representation as
Qualitative and Quantitative
• Affects Interpretations, Observability,
and Solutions

A closer Look: Feedback and Inclusion/Exclusion
Questions:
• Can an organization be
responsible for responding to
realities it cannot observe?
• Is it acceptable to experiment
on candidates to observe their
outcomes?

Critical Thinking About Predictive Policing
Seeming Paradox:
Arrest data is a census, but feedback
loops result from (and exacerbate)
minority over-representation. How?
• Arrest as a proxy for crime
• Statistical vs Historical Bias
• Garbage in, Garbage out fails here
(Wrong Proxies)
Questions:
Would we say that the
sociological data is biased?
What could strengthen the proxy
relationship?
Arrest Data
Model
Arrest
Prediction
Arrest Data
Model
Arrest
Prediction
Crime
Prediction
Representative Not Representative

Dynamics, Gaming, and Context
College Admissions and ML
Features:
• GPA
• Extracurriculars
• ACT, SAT

Goodhart’s Law (Gaming):
"When a measure becomes a target,
it ceases to be a good measure.“
Features:
• GPA:
• Grade Inflation
• Extracurriculars:
• Filling the Boxes
• ACT, SAT:
• Tutoring and
Teaching to Test

Systemic Bias:
When a process supports or reproduces
specific outcomes.
Systemic
Bias
Features:
• GPA:
• Grade Inflation
• Quality of
Education
• Time and Money
• ACT, SAT:
• Tutoring and
Teaching to Test
• Less Focus on Test

Testing Bias:
“Differential Validity of Test Scores for
Sub-Groups”
Systemic
Bias
Features:
• GPA:
• Grade Inflation
• Quality of Education
• Unsuitable Curriculum
• Time and Money
• Performance
interpretation
• ACT, SAT:
• Tutoring and Teaching
to Test
• Less Focus on Test
• Cultural Focus
Performance
Metrics

“the discussion in the technical community…is happening
without a moral framework…and you know, it’s kinda
amateur hour…, a lot of it is, lets design a loss function that
measures your utility and your fairness and whatever else
and just optimize the heck out of it”
-- Arvind Narayanan
Biggest Problem of All

A Few Starter Questions
• Is data the cause of bias in ML decisions?
• Do we need new laws (not disparate impact/treatment) to address
these systems?
• How much needs to be considered from outside an organization
when claiming that a system is fair?
• Is automated decision making ethically different than human decision
making?
• Can there be unfair models for reasons other than group differences?
• Should we study fairness with privacy, security, transparency…or not?

Algorithmic Fairness: A Brief Introduction

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (14)

Similaire à Algorithmic Fairness: A Brief Introduction

Similaire à Algorithmic Fairness: A Brief Introduction (20)

Dernier

Dernier (20)

Algorithmic Fairness: A Brief Introduction