A two-part guest lecture originally given at University of Missouri St. Louis. This brief introduction focuses on the origin and scope of recent research and debate in algorithmic fairness (a sub-set of AI Ethics) at a high level that does not require expertise in machine learning or programming.
7. Probabilistic vs Causal Policies
Rain Dancing Problems (End Drought)
• Key Question: Does rain dancing cause rain?
• Solution: Causal Rule (If yes, then rain dance)
• Doesn’t require prediction of anything
Umbrella Problems (Stay Dry)
• Key Question: Will it rain today?
• Solution: Probabilistic cost/benefit (many variables)
• Requires prediction of rain, and knowledge of payouts
Takeaway: Some problems require prediction
for good results. ML predicts better than
other approaches.
Kleinberg, Jon. “Prediction Policy Problems.”
8. How do we get the best
of both?
First, we look at the difference?
9. Why are ML Policies Different from Causal Policies?
11. A Deeper Look: Probabilistic Decision Boundaries
Basics:
• (Un)Certainty of Cases
• Risk vs Class
• Red Errors Blue Errors
Takeaways:
• Estimated Functions aren’t
like known functions
• Different results for equally
deserving people/groups
• The uncertain space is most
interesting
• Upside
• Downside
Class
Boundary
Uncertain
Space
~0 < x < ~1
Creditworthy
Not CW
16. Hypothesis 1: Garbage
In, Garbage Out
Problem: Biased Data
• Structural Bias
• Cognitive Bias
• Historical Bias
• Stored in Protected Features
Solution:
Blind the model to Protected Features
17. Attempts to Blind Models
Problem: Discriminatory Decisions
Goal: Race-Blindness
Original Solution: Drop Protected Features
• Problem: Redundant Encodings or Proxies
Solution 2: Eliminate proxy information
• Problem: Poor performance
Solution 3: Eliminate Proxies’ influence with penalty
• Problem: Downstream effects
Race
IncomeCity
Income City
Race
IncomeCity
Problem 1) Drop Features
2) Eliminate Proxy Information
3) Penalize Model
18. What Went Wrong?
“disparate treatment doctrine does not
appear to do much to regulate
discriminatory data mining”
--Solon Barocas
“The race-aware predictor dominates
the other prediction functions”
--Jon Kleinberg
Questions:
Should we still race-blind models?
Are there other reasons to eliminate sensitive information?
19. Hypothesis 2:
Only Outcomes
Matter, Make
them “Fair”
• Problem: Error-Rate
Imbalances (COMPAS)
• Structural Bias
• Cognitive Bias
• Historical Bias
• Black-Box Algorithms
Repeat these Errors
• Solution:
• Move the Decision
Boundary to Fix Imbalances
20. Re-Thresholding Decision Boundaries
Make sure groups are in
line with predetermined
“fairness metrics”
Goal:
Move decision boundary until you
get the results you want
Popular Metrics:
• Model Blinding
• Demographic Parity
• Equal Opportunity
There are more than 20 different
ones
New Class
Boundary
Uncertain
Space
~0 < x < ~1
Creditworthy
Not CW
22. Quick Recap From Day 1
About ML
• ML is Function Estimation
• ML Decisions are Mostly Classification Problems
• These Classifications Are Always Uncertain
About Ethics
• Compas (Pro-Publica and Northpoint)
• Hypothesis 1: Biased Data, Erase Protected Information
• Didn’t have legal teeth or Practical Effectiveness
• Hypothesis 2: Error-Rate Imbalance, Move Decision Boundary
Researchers Original Solution: Fair Classifier
24. Blinding
Model
Group Unaware:
Class features and all
“proxy” information
removed
Total Profit: 25,600
Decision Rules: Same
True Positive Rate:
Unequal
Percent Approved:
Unequal
27. What Went Wrong? (Legally)
“smaller differences can constitute adverse
impact and greater differences may not,
depending on circumstances”
--Solon Barocas
28. What Went Wrong? (Conceptually)
“without precise definitions of beliefs about
the state of the world and…harms one wishes
to prevent, our results show that it is not
possible to make progress”
--Sorelle Friedler
Observed
Space
Construct
Space
Predicted
Space
GPA Success in High
School
College
Performance
Arrest Record Criminal Past Recidivism
Experience Job Knowledge Productivity
29. What Went Wrong? (Practically)
“since the optimal constrained algorithms differ
from the optimal unconstrained algorithm,
fairness has a cost”
--Sam Corbett-Davies
30. Do we need new laws to address AI Ethics?
Is it acceptable to use different rules for
different people?
What matters? The Rule, the Outcomes, Both…?
Can fairness exist outside of context?
32. Feedback Loops (Exacerbating Feedback)
Model Trained on
Skewed Data
Poor Performance
on
Misrepresented
Class
Unobservability
Or
Over-observability
Under- and Over-
Representation
In Sample
Skewed Dataset
Example 1: Predictive Policing
• Targeting
• Arrest
• Data Skewed
• Poor Sample for Model
• Skewed Prediction
• Targeting
Over- Observation/Representation
33. Feedback Loops (Exacerbating Feedback)
Model Trained on
Skewed Data
Poor Performance
on
Misrepresented
Class
Unobservability
Or
Over-observability
Under- and Over-
Representation
In Sample
Skewed Dataset
Example 1: Predictive Policing
• Targeting
• Arrest
• Data Skewed
• Poor Sample for Model
• Skewed Prediction
• Targeting
Over- Observation/Representation
Example 2: Loans
• Group Traditionally Denied Loans
• Don’t Get Loans
• Can’t Pay Back
• Can’t Establish Credit
• Predicted as not “Credit-Worthy”
• Denied Loans
Under-Observation/Representation
34. A closer Look: Feedback and Inclusion/Exclusion
Survivorship Bias: Focus on those
that made it through selection
• Wald’s damaged planes
• Under/Over- Representation as
Qualitative and Quantitative
• Affects Interpretations, Observability,
and Solutions
35. A closer Look: Feedback and Inclusion/Exclusion
Survivorship Bias: Focus on those
that made it through selection
• Wald’s damaged planes
• Under/Over- Representation as
Qualitative and Quantitative
• Affects Interpretations, Observability,
and Solutions
36. A closer Look: Feedback and Inclusion/Exclusion
Questions:
• Can an organization be
responsible for responding to
realities it cannot observe?
• Is it acceptable to experiment
on candidates to observe their
outcomes?
37. Critical Thinking About Predictive Policing
Seeming Paradox:
Arrest data is a census, but feedback
loops result from (and exacerbate)
minority over-representation. How?
• Arrest as a proxy for crime
• Statistical vs Historical Bias
• Garbage in, Garbage out fails here
(Wrong Proxies)
Questions:
Would we say that the
sociological data is biased?
What could strengthen the proxy
relationship?
Arrest Data
Model
Arrest
Prediction
Arrest Data
Model
Arrest
Prediction
Crime
Prediction
Representative Not Representative
38. Dynamics, Gaming, and Context
College Admissions and ML
Features:
• GPA
• Extracurriculars
• ACT, SAT
39. Dynamics, Gaming, and Context
Goodhart’s Law (Gaming):
"When a measure becomes a target,
it ceases to be a good measure.“
Features:
• GPA:
• Grade Inflation
• Extracurriculars:
• Filling the Boxes
• ACT, SAT:
• Tutoring and
Teaching to Test
40. Dynamics, Gaming, and Context
Systemic Bias:
When a process supports or reproduces
specific outcomes.
Systemic
Bias
Features:
• GPA:
• Grade Inflation
• Quality of
Education
• Extracurriculars:
• Filling the Boxes
• Time and Money
• ACT, SAT:
• Tutoring and
Teaching to Test
• Less Focus on Test
41. Dynamics, Gaming, and Context
Testing Bias:
“Differential Validity of Test Scores for
Sub-Groups”
Systemic
Bias
Features:
• GPA:
• Grade Inflation
• Quality of Education
• Unsuitable Curriculum
• Extracurriculars:
• Filling the Boxes
• Time and Money
• Performance
interpretation
• ACT, SAT:
• Tutoring and Teaching
to Test
• Less Focus on Test
• Cultural Focus
Performance
Metrics
42. “the discussion in the technical community…is happening
without a moral framework…and you know, it’s kinda
amateur hour…, a lot of it is, lets design a loss function that
measures your utility and your fairness and whatever else
and just optimize the heck out of it”
-- Arvind Narayanan
Biggest Problem of All
43. A Few Starter Questions
• Is data the cause of bias in ML decisions?
• Do we need new laws (not disparate impact/treatment) to address
these systems?
• How much needs to be considered from outside an organization
when claiming that a system is fair?
• Is automated decision making ethically different than human decision
making?
• Can there be unfair models for reasons other than group differences?
• Should we study fairness with privacy, security, transparency…or not?