Fairly Measuring Fairness In Machine Learning

Fairly Measuring
Fairness
HJ van Veen - Data Scientist - Nubank Brasil

Protected Features
• Case: Need to avoid Age discrimination
• So remove the Age feature from the
dataset
• Then retrain the model and evaluate
performance

Protected Features
• We use publicly available credit card
default data. From: The comparisons of data mining techniques for the predictive
accuracy of probability of default of credit card clients. Yeh, I. C., & Lien, C. H. (2009)
• XGBoost. 5-fold Stratiﬁed Cross-
Validation.
• Including Age feature: AUC 0.78454
• Removing Age: AUC drops to 0.78326

Protected Features
• Business impact of removing Age feature:
• 0.17% decrease approval rate (keeping
default rate constant).
• 0.05% increase default rate (keeping approval
rate constant)

Protected Features
• But “Fairness through unawareness”
does not work.
• Redundant Encodings hold information
on the protected feature.

Protected Features
• Use the protected feature as a target.
• We try to predict it from the remaining
features.
• XGBoost. 5-Fold Quantile Stratiﬁed
Cross-Validation.

Protected Features
• Baseline: Mean age for everyone.
• Result: 7.54611 Mean Absolute Error.
• Improvement: XGBoost Regression
• Result: 5.99789 Mean Absolute Error 
0.3 R2 Score

Protected Features
• Using the other features we did better
than average guessing
• 7.54611 - 5.99789 = Bayesian Fairness
Rate
• Best we can do, without removing other
non-protected features

Discrimination-aware
Data Mining
• First paper (2008) to look at
discrimination in ML models. By: Dino Pedreschi, Salvatore
Ruggieri, Franco Turini
• Used simple rule mining on loan data.
• See how much of performance can be
explained by discriminating feature.

Data Mining
city=NYC
==> class=bad
-- conf:(0.25)
race=black, city=NYC
==> class=bad
-- conf:(0.75)

Data Mining
neighborhood=10451, city=NYC
==> class=bad
-- conf:(0.95)
neighborhood=10451, city=NYC
==> race=black
-- conf:(0.80)

Equality of Opportunity
in Supervised Learning
• Paper (2016) by Google Research Moritz Hardt,
Eric Price, Nathan Srebro
• Look at groups in the protected feature
• FICO loan data

• Every profit-optimizing model has a
threshold at which a decision is made.
• Putting fairness constraints on your
model often means losing profit.
• We can study profit for a model under
different threshold constraints.

• Max-Profit. No Fairness Constraints.
Pick different profit-maximizing threshold
for every group.
• 100% of Max-Profit.

• Feature blind. Requires threshold to be
the same for every group.
• 99.3% of Max-Proﬁt

• Equal Opportunity. Picks for each
group a threshold such that the fraction
of non-defaulting group members that
qualify for loans is the same.
• 92.8% of Max-Proﬁt.

• Equalized odds. Requires both the
fraction of non-defaulters that qualify for
loans and the fraction of defaulters that
qualify for loans to be constant across
groups.

• Demographic parity. Picks for each
group a threshold such that the fraction
of group members that qualify for loans
is the same.

Conclusion
• Studying fairness is new, but important
• Fairness has a measurable cost
• Ignoring the feature may not be enough
• There are different fairness constraints
• We still need the unfair feature to detect
unfairness

Fairly Measuring Fairness In Machine Learning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Fairly Measuring Fairness In Machine Learning

Similaire à Fairly Measuring Fairness In Machine Learning (20)

Dernier

Dernier (17)

Fairly Measuring Fairness In Machine Learning