Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Fairness-aware Machine Learning:
Practical Challenges and Lessons Learned
KDD 2019 Tutorial
August 2019
Sarah Bird (Micros...
The Coded Gaze [Joy Buolamwini 2016]
• Face detection
software:
Fails for some
darker faces
https://www.youtube.com/watch?...
Gender Shades [Joy Buolamwini & Timnit Gebru, 2018]
• Facial analysis
software:
Higher accuracy for
light skinned men
• Er...
Algorithmic Bias
▪ Ethical challenges posed by AI
systems
▪ Inherent biases present in society
– Reflected in training dat...
Laws against Discrimination
Immigration Reform and Control Act
Citizenship
Rehabilitation Act of 1973;
Americans with Disa...
Fairness Privacy
Transparency Explainability
Fairness Privacy
Transparency Explainability
Fairness Privacy
Transparency Explainability
Fairness Privacy
Transparency Explainability
Fairness Privacy
Transparency Explainability
Related KDD’19 sessions:
1.Tutorial: Explainable AI in Industry (Sun, 1-5pm)
...
“Fairness by Design” for AI products
Outline
• Algorithmic Bias / Discrimination
• Sources of Data Biases in ML Lifecycle
• Techniques for Fairness in ML
• AI ...
Algorithmic Bias / Discrimination
and broader / related issues
Other Great
Tutorials
Fairness in Machine Learning
Solon Barocas and Moritz Hardt, NeurIPS 2017
Challenges of incorporatin...
[Barocas & Hardt 2017]
"[H]iring could become faster and less expensive, and […] lead
recruiters to more highly skilled people who are better mat...
[Barocas & Hardt 2017]
"But software is not free of human influence. Algorithms are written
and maintained by people, and machine learning algori...
Do Better
Avoid Harm
[Cramer et al 2019]
More positive outcomes & avoiding harmful outcomes
of algorithms for groups of people
[Cramer et al 2019]
[Cramer et al 2019]
More positive outcomes & avoiding harmful outcomes
of automated systems for groups of people
Legally Recognized Protected Classes
Race (Civil Rights Act of 1964); Color (Civil Rights Act of
1964); Sex (Equal Pay Act...
Other
Categories
Societal Categories
i.e., political ideology, language, income,
location, topical interests, (sub)culture...
Types of Harm
Harms of allocation
withhold opportunity or resources
Harms of representation
reinforce subordination along ...
Bias,
Discrimination
& Machine
Learning
Isn’t bias a technical concept?
Selection, sampling, reporting bias, Bias
of an es...
Discrimination is not a general concept
It is domain specific
Concerned with important opportunities that affect people’s ...
Regulated
Domains
Credit (Equal Credit Opportunity Act)
Education (Civil Rights Act of 1964;
Education Amendments of 1972)...
Discrimination
Law and Legal
Terms
Treatment
Disparate Treatment, Equality of
Opportunity, Procedural Fairness
Outcome
Dis...
[https://www.reddit.com/r/GCdebatesQT/comments/7qpbpp/food_for_thought_equality_vs_equity_vs_justice/]
Fairness is
Political
Equal Treatment vs Equal Outcome
Fairness is
Political
Someone must decide
Decisions will depend on the
product, company, laws, country, etc.
Why do this?
Better product and Serving Broader
Population
Responsibility and Social Impact
Legal and Policy
Competitive A...
Industry Best
Practices
for Product Conception, Design, Implementation, and
Evolution
Responsible AI is a complex and broad topic
Microsoft's AI Principles
State of the
Art in Industry
• People: Bring in domain expertise and diversity
• Measure: Emphasize analysis and testing
•...
Process Best
Practices
Identify product goals
Get the right people in the room
Identify stakeholders
Select a fairness app...
Repeat for every
new feature, product
change, etc.
Google's Responsible Fairness Practices
https://ai.google/education/responsible-ai-practices?category=fairness
Summary:
• ...
Fairness
AI systems should treat everyone fairly and
avoid affecting similarly situated groups of
people in different ways...
Sources of Data Biases
in ML Lifecycle
Collaborators
Much of this section is based on survey paper and
tutorial series written by Alexandra Olteanu,
Carlos Casti...
To present a taxonomy of challenges that can
occur at different stages of social data analysis.
To recognize, understand, ...
Design Data Model Application
Design Data Model Application
Design Data Model Application
data bias
Data bias: a systematic
distortion in data that
compromises its use
for a task.
Note: Bias must be considered relative to task
48
Gender discrimination is
illegal
Gender-specific medical
diagnosis is de...
What does data bias look like?
Measure systematic distortions along 5 data properties
1. Population Biases
2. Behavioral B...
What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
Differences in demographic...
Example:
Different user
demographics on
different social
platforms
51
See [Hargittai’07] for statistics about social media...
Systematic distortions must be evaluated in a
task dependent way
Gender Shades
E.g., for many tasks, populations
should ma...
What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
2. Behavioral Biases
Diffe...
Behavioral Biases from Functional Issues
Platform functionality and algorithms influence human behaviors
and our observati...
Cultural elements and social contexts are
reflected in social datasets
55
Figure from
[Hannak et al. CSCW 2017]
Societal biases embedded in behavior can be
amplified by algorithms
Users pick
biased options
Biased
actions are
used as
f...
What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
2. Behavioral Biases
3. Co...
Behavioral Biases from Normative Issues
Community norms and societal biases influence observed behavior
and vary across on...
Privacy concerns affect what content users share, and,
thus, the type of patterns we observe.
Foursquare/Image from [Lindq...
As other media, social
media contains
misinformation and
disinformation
60
Misinformation is false information,
unintentio...
What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
2. Behavioral Biases
3. Co...
Behavior-based and connection-based social links are
different
62
Figure from [Wilson et al. EuroSys’09]
Online social networks formation
also depends on factors
external to the social
platforms
● Geography & distance
● Co-visi...
What does data bias look like?
Measure distortions along 5 data properties
1. Population Biases
2. Behavioral Biases
3. Co...
Different demographics can
exhibit different growth rates
across and within social
platforms
65
TaskRabbit and Fiverr are ...
E.g., Change in Features over Time
Introducing a new feature or
changing an existing feature
impacts usage patterns on the...
Data Collection
Biases can
come in at
any step
along the
data analysis
pipeline
● Metrics: e.g., reliability, lack of doma...
Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Consider
team composition
for diversity of thou...
Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Understand the task,
stakeholders, and
potentia...
Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Check data sets
Consider data provenance
What i...
Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Check models and validate results
Why is the mo...
Design Data Model Application
Best Practices for Bias Avoidance/Mitigation
Post-Deployment
Ensure optimization and guardra...
Key Takeaways
• Many, complex biases at all stages of data
collection and analysis
• Population, Behavioral, Content Produ...
Techniques for Fairness in ML
Thanks to
Ben Hutchison
Alex Beutel (Research Scientist, fairness in ML),
Allison Woodruff (UX Research, privacy, fairness...
Techniques
for Fairness
in ML
Product Introspection
Practical Testing
Training Data
Modeling
UI/Product Design
Product
Introspection
Product Introspection (1):
Make Your Key Choices Explicit [Mitchell et al., 2018]
Goals Decision Prediction
Profit from lo...
Product Introspection (2):
Identify Potential Harms
• What are the potential harms?
• Applicants who would have repaid are...
Seek out Diverse Perspectives
• Fairness Experts
• User Researchers
• Privacy Experts
• Legal
• Social Science Backgrounds...
Practical Testing
Launch with Confidence: Testing for Bias
• How will you know if users are being
harmed?
• How will you know if harms are u...
Model Predictions
Evaluate for Inclusion - Confusion Matrix
Model Predictions
Positive Negative
Evaluate for Inclusion - Confusion Matrix
Model Predictions
Positive Negative
● Exists
● Predicted
True Positives
● Doesn’t exist
● Not predicted
True Negatives
Eva...
Model Predictions
Positive Negative
● Exists
● Predicted
True Positives
● Exists
● Not predicted
False Negatives
● Doesn’t...
Efficient Testing for Bias
• Development teams are under multiple
constraints
• Time
• Money
• Human resources
• Access to...
Choose your evaluation metrics in light
of acceptable tradeoffs between
False Positives and False Negatives
Privacy in Images
False Positive: Something that doesn’t
need to be blurred gets blurred.
Can be a bummer.
False Negative:...
Spam Filtering
False Negative: Email that is SPAM is
not caught, so you see it in your inbox.
Usually just a bit annoying....
Types of Practical
Fairness Testing
1. Targeted Tests
2. Quick Tests
3. Comprehensive Tests
4. Ecologically Valid Tests
5....
1. Targeted Tests
Based on prior experience/knowledge
• Computer Vision
⇒ Test for dark skin
• Natural Language Processing...
Targeted Testing of a Gender Classifier
[Joy Buolamwini & Timnit Gebru, 2018]
• Facial recognition
software:
Higher accura...
2. Quick
Tests
• "Cheap"
• Useful throughout product cycle
• Spot check extreme cases
• Low coverage but high informativit...
Quick Tests
for Gender in
Translate
3. Comprehensive Tests
Include sufficient data for each subgroup
• May include relevant combinations of attributes
• Somet...
Comprehensive Testing of a Toxic Language Detector
[Dixon et al., 2018]ConversationAI
Comprehensive Testing of a Toxic Language Detector
[Dixon et al., 2018]Problem: A False Positive Bias
Comprehensive Testing of a Toxic Language Detector
[Dixon et al., 2018]
AUC Metrics for
Comprehensive Testing
• Subgroup AUC:
• Subgroup Positives vs
Subgroup Negatives
• "BPSN" AUC:
• Backgroun...
Comprehensive Testing of a Toxicity Detector
https://github.com/conversationai/perspectiveapi/blob/master/model_cards/Engl...
4. Ecologically Valid
Testing
Data is drawn from a distribution representative of the
deployment distribution
• Goal is NO...
Ecologically Valid Testing:
Distributions Matter
What is being compared?
Over what data?
Challenges with Ecologically Valid Testing
• Post-deployment distributions may not be known
• Product may not be launched ...
5. Adversarial
Tests
Search for rare but extreme harms
• “Poison needle in haystack”
• Requires knowledge of society
Typic...
Hypothetical Example of Adversarial Testing
• Emoji autosuggest: are happy emoji suggested for sad sentences?
My dog has g...
Summary of Practical
Fairness Testing
1. Targeted Tests: domain specific (image, language, etc)
2. Quick Tests: cheap test...
Fairness Testing Practices
are Good ML Practices
• Confidence in your product's fairness
requires fairness testing
• Fairn...
Training Data
Fairness-aware Data Collection
[Holstein et al., 2019]
• ML literature generally assumes data is fixed
• Often the solutio...
Get to Know Your Training Data: Facets Dive
Datasheets for Datasets [Gebru et al., 2018]
Fairness-Aware Data Collection Techniques
1. Address population biases
• Target under-represented (with respect to the use...
Sometimes data biases are
unavoidable
Solution: ML Techniques
Modeling
Practical Concerns with Fair Machine Learning
•Is the training process stable?
•Can we guarantee that fairness policies
wi...
Machine Learning Techniques: Adversarial Training?
P(Label=1) P(Group)
Negative
Gradient
Fairly well-studied with some nic...
Machine Learning: Correlation Loss
[Beutel et al., 2018]
Motivation: Overcome training instability with adversarial traini...
Predicted P(Target) distribution
for “Blue” and “Red” examples
(Illustrative Example)
min Loss(Label, Pred)
Pred = P(Label...
min Loss(Label, Pred)
+ Abs(Corr(Pred, Group))|Label=0
Pred = P(Label=1)
Predicted P(Target) distribution
for “Blue” and “...
● Computed per batch
● Easy to use
● More stable than adversarial
training.
min Loss(Label, Pred)
+ Abs(Corr(Pred, Group))...
Machine Learning: Constrained Optimization
[Cotter et al., 2018]
Motivation: Can we ensure that fairness policies are sati...
Model Cards for Model Reporting[Mitchell et al., 2018]
Further Machine Learning Techniques
Many more approaches are linked to from the tutorial website.
UI/Product Design
Fairness in UI/Product Design
1. Robust UIs handle ML failures gracefully
2. UIs should empower users
AI Fairness and Transparency Tools
10:00am
Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Micr...
Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Micr...
Machine Learning Transparency and Fairness
InterpretML github.com/Microsoft/interpret
pip install –U interpret
InterpretML
Goal: to provide researchers and AI developers with a toolkit that
allows for:
• Explaining machine learning m...
Azure Machine Learning Interpretability Toolkit
• Training Time Input: model + training data
• Any models that are trained on datasets in Python `numpy.array`, `pandas.Da...
• Interpretability at training time
• Combination of glass-box models and black-box explainers
• Auto reason code generati...
IBM Open Scale
Goal: to provide AI operations team with a toolkit that allows for:
• Monitoring and re-evaluating machine ...
IBM Open Scale
Goal: to provide AI operations team with a toolkit that allows for:
• Monitoring and re-evaluating machine ...
IBM Open Scale
IBM Open Scale
IBM Open Scale
Fairness
Accuracy
Performance
IBM Open Scale
Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Micr...
What If Tool
Goal: Code-free probing of machine learning models
• Feature perturbations (what if scenarios)
• Counterfactu...
What If Tool
IBM Fairness 360
Datasets
Toolbox
Fairness metrics (30+)
Bias mitigation algorithms (9+)
Guidance
Industry-specific tutori...
IBM Fairness 360
Datasets
Toolbox
Fairness metrics (30+)
Bias mitigation algorithms (9+)
Guidance
Industry-specific tutori...
IBM Fairness 360
Example:
Datasets
Toolbox
Fairness metrics (30+)
Bias mitigation algorithms (9+)
Guidance
Industry-specif...
IBM Fairness 360
Example:
Datasets
Toolbox
Fairness metrics (30+)
Bias mitigation algorithms (9+)
Guidance
Industry-specif...
IBM Fairness 360
Datasets
Toolbox
Fairness metrics (30+)
Bias mitigation algorithms (9+)
Guidance
Industry-specific tutori...
IBM Fairness 360
Datasets
Toolbox
Fairness metrics (30+)
Bias mitigation algorithms (9+)
Guidance
Industry-specific tutori...
IBM Fairness 360
Datasets
Toolbox
Fairness metrics (30+)
Bias mitigation algorithms (9+)
Guidance
Industry-specific tutori...
• The toolkit should only be used in a very limited setting:
allocation or risk assessment problems with well-defined
prot...
Appropriateness of AIF360
• The toolkit should only be used in a very limited setting:
allocation or risk assessment probl...
Appropriateness of AIF360
• The toolkit should only be used in a very limited setting:
allocation or risk assessment probl...
Microsoft Research Fairlearn
Wrapper around any classification/regression algorithm
q easily integrated into existing ML s...
• Define fairness metric w/r/t/ protective attribute(s)
• ML goal becomes minimizing classification/regression
error while...
Microsoft Research Fairlearn
Goal: find a classifier/regressor [in some family]
that minimizes classification/regression e...
Microsoft Research Fairlearn
Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Micr...
Datasheets for Datasets [Gebru et al., 2018]
• Better data-related documentation
• Datasheets for datasets: every dataset,...
Model Cards for Model Reporting[Mitchell et al., 2018]
Fact Sheets [Arnold et al., 2019]
• Is distinguished from “model cards” and “datasheets” in that the
focus is on the final...
Overview of Transparency and Fairness Tools
Bias Detection Bias Mitigation Responsible Metadata
Microsoft InterpretML
Micr...
Fairness Methods in Practice
(Case Studies)
Deep Dive: Talent Search
Fairness in AI @ LinkedIn
Create economic opportunity for every
member of the global workforce
LinkedIn’s Vision
Connect the world's professionals t...
630M
Members
30M
Companies
20M
Jobs
35K
Skills
90K
Schools
100B+
Updates viewed
LinkedIn Economic Graph
AI @LinkedIn
25 B
ML A/B experiments
per week
data processed
offline per day
2002.15 PB
data processed
nearline per day
2 ...
Guiding Principle:
“Diversity by Design”
“Diversity by Design” in LinkedIn’s Talent Solutions
Insights to
Identify Diverse
Talent Pools
Representative
Talent Searc...
Plan for Diversity
Plan for Diversity
Identify Diverse Talent Pools
Inclusive Job Descriptions / Recruiter Outreach
Representative Ranking for Talent Search
S. C. Geyik, S. Ambler,
K. Kenthapadi, Fairness-
Aware Ranking in Search &
Recomm...
Intuition for Measuring Representativeness
• Ideal: Top ranked results should follow a desired distribution on
gender/age/...
Desired Proportions within the Attribute of Interest
• Compute the proportions of the values of the attribute (e.g., gende...
Measuring (Lack of) Representativeness
• Skew@k
• (Logarithmic) ratio of the proportion of candidates having a given attri...
Fairness-aware Reranking Algorithm (Simplified)
• Partition the set of potential candidates into different buckets for
eac...
Architecture
Validating Our Approach
• Gender Representativeness
• Over 95% of all searches are representative compared to the qualifie...
Lessons
learned
• Post-processing approach desirable
• Model agnostic
• Scalable across different model
choices for our ap...
Acknowledgements
•Team:
• AI/ML: Sahin Cem Geyik, Stuart Ambler, Krishnaram Kenthapadi
• Application Engineering: Gurwinde...
Reflections
• Lessons from fairness challenges è
Need “Fairness by Design” approach
when building AI products
• Case studi...
Conversational Agents
Responsible AI
and Conversational
Agents
Background: Conversational Agents
• Social bots
• Informational bots
• Task-oriented bots
Eliza, Joseph Weizenbaum (MIT) 1...
Design Data Model Application
Design Data Model Application
Design Data Model Application
Questions to ask during
AI Design
Who is affected by AI?
How might AI cause harms?
Stakeholders: Who might be affected?
1. Humans speaking with the agent
• Emotional harms, misinformation, threaten task co...
How might AI cause harms?
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about task status
• Mis...
How might AI cause harms? Functional Harms
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about ...
How might AI cause harms? Functional Harms
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about ...
How might AI cause harms? Functional Harms
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about ...
How might AI cause harms?
Functional Harms
• Misrepresentation of capabilities
• Misinforming user about task status
• Mis...
How might AI cause harms?
Social Harms: Harms to Individuals
• Inciting/encouraging harmful behavior
• Self/harm, suicide
...
How might AI cause harms?
Social Harms: Harms to Communities
• Promoting violence, war, ethnic cleansing, …
• Including pr...
Why is this hard?
Language is ambiguous, complex,
with social context
Examples of complex failures:
• Failure to deflect/t...
Why is this hard?
Language is ambiguous, complex,
with social context
Examples of complex failures:
• Failure to deflect/t...
Why is this hard?
Language is ambiguous, complex,
with social context
Examples of complex failures:
• Failure to deflect/t...
Design Data Model Application
Implications for data collection
Common data sources
• Hand-written rules
• Existing conversational data (e.g., social med...
Design Data Model Application
Design Data Model Application
Responsible bots: 10 guidelines for
developers of conversational AI
1. Articulate the purpos...
Design Data Model Application
Responsible bots: 10 guidelines for
developers of conversational AI
1. Articulate the purpos...
Design Data Model Application
Responsible bots: 10 guidelines for
developers of conversational AI
1. Articulate the purpos...
Key take-away points
• Many stakeholders affected by conversational agent AIs
• Not only people directly interacting with ...
Acknowledgments
• Chris Brockett, Bill Dolan, Michel Galley, Ece Kamar
Google Assistant
Google
Assistant
Key Points:
• Think about user harms
How does your product make people feel
• Adversarial ("stress") test...
Computer Vision
Google Camera
Key points:
• Check for unconscious bias
• Comprehensive testing:
"make sure this works
for everybody"
Night Sight
This is a “Shirley Card”
Named after a Kodak studio model
named Shirley Page, they were the
primary method for calibrating...
Until about 1990, virtually all
Shirley Cards featured Caucasian
women.
SKIN TONE IN PHOTOGRAPHY
SOURCES
Color film was bu...
As a result, photos featuring
people with light skin looked
fairly accurate.
SKIN TONE IN PHOTOGRAPHY
SOURCES
Color film w...
Photos featuring people with
darker skin, not so much...
SKIN TONE IN PHOTOGRAPHY
SOURCES
Color film was built for white p...
Google Clips
Google Clips
"We created controlled datasets by
sampling subjects from different genders
and skin tones in a balanced mann...
Geena Davis Inclusion Quotient
[with Geena Davis Institute on Gender in Media]
Machine Translation
(Historical)
Gender
Pronouns in
Translate
Three Step Approach
1. Detect Gender-Neutral Queries
Train a text classifier to detect when a Turkish query is gender-neutral.
• trained on th...
2. Generate Gender-Specific Translations
• Training: Modify training data to add an additional input token specifying the
...
3. Check for Accuracy
Verify:
1. If the requested feminine translation is feminine.
2. If the requested masculine translat...
Result: Reduced Gender Bias in Translate
Smart Compose
Adversarial Testing
for Smart Compose
in Gmail
Adversarial Testing
for Smart Compose
in Gmail
Adversarial Testing
for Smart Compose
in Gmail
Adversarial Testing
for Smart Compose
in Gmail
Key Takeaways
Good ML Practices Go a Long Way
Lots of low hanging fruit in terms of
improving fairness simply by using
machine learning ...
Breadth and Depth Required
Looking End-to-End is critical
• Need to be aware of bias and potential
problems at every stage...
Process Best
Practices
Identify product goals
Get the right people in the room
Identify stakeholders
Select a fairness app...
Beyond
Accuracy
Performance
Cost
Fairness and Bias
Privacy
Security
Safety
Robustness
The Real
World is What
Matters
Decisions should be
made considering the
real world goals and
outcomes
You must have
people...
Key Open Problems in
Applied Fairness
Key Open Problems in Applied Fairness
What if you don’t have the
sensitive attributes?
When should you use what
approach? ...
Related Tutorials / Resources
• Sara Hajian, Francesco Bonchi, and Carlos Castillo, Algorithmic bias: From discrimination
...
Fairness Privacy
Transparency Explainability
Related KDD’19 sessions:
1.Tutorial: Explainable AI in Industry (Sun, 1-5pm)
...
Thanks! Questions?
•Tutorial website: https://sites.google.com/view/kdd19-
fairness-tutorial
•Feedback most welcome J
• sl...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KDD 2019 Tutorial)
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KDD 2019 Tutorial)
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KDD 2019 Tutorial)
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KDD 2019 Tutorial)
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KDD 2019 Tutorial)
Prochain SlideShare
Chargement dans…5
×

Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KDD 2019 Tutorial)

Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we present open problems and research directions for the data mining / machine learning community.

  • Soyez le premier à commenter

Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KDD 2019 Tutorial)

  1. 1. Fairness-aware Machine Learning: Practical Challenges and Lessons Learned KDD 2019 Tutorial August 2019 Sarah Bird (Microsoft), Ben Hutchinson (Google), Sahin Geyik (LinkedIn) Krishnaram Kenthapadi (LinkedIn), Emre Kıcıman (Microsoft), Margaret Mitchell (Google), Mehrnoosh Sameki (Microsoft) https://sites.google.com/view/kdd19-fairness-tutorial
  2. 2. The Coded Gaze [Joy Buolamwini 2016] • Face detection software: Fails for some darker faces https://www.youtube.com/watch?v=KB9sI9rY3cA
  3. 3. Gender Shades [Joy Buolamwini & Timnit Gebru, 2018] • Facial analysis software: Higher accuracy for light skinned men • Error rates for dark skinned women: 20% - 34%
  4. 4. Algorithmic Bias ▪ Ethical challenges posed by AI systems ▪ Inherent biases present in society – Reflected in training data – AI/ML models prone to amplifying such biases ▪ ACM FAT* conference / KDD’16 & NeurIPS’17 Tutorials
  5. 5. Laws against Discrimination Immigration Reform and Control Act Citizenship Rehabilitation Act of 1973; Americans with Disabilities Act of 1990 Disability status Civil Rights Act of 1964 Race Age Discrimination in Employment Act of 1967 Age Equal Pay Act of 1963; Civil Rights Act of 1964 Sex And more...
  6. 6. Fairness Privacy Transparency Explainability
  7. 7. Fairness Privacy Transparency Explainability
  8. 8. Fairness Privacy Transparency Explainability
  9. 9. Fairness Privacy Transparency Explainability
  10. 10. Fairness Privacy Transparency Explainability Related KDD’19 sessions: 1.Tutorial: Explainable AI in Industry (Sun, 1-5pm) 2.Workshop: Explainable AI/ML (XAI) for Accountability, Fairness, and Transparency (Mon) 3.Social Impact Workshop (Wed, 8:15 – 11:45) 4.Keynote: Cynthia Rudin, Do Simpler Models Exist and How Can We Find Them? (Thu, 8-9am) 5.Research Track Session RT17: Interpretability (Thu, 10-12) 6.Several papers on fairness (e.g., ADS7 (Thu, 10-12), ADS9 (Thu, 1:30-3:30))
  11. 11. “Fairness by Design” for AI products
  12. 12. Outline • Algorithmic Bias / Discrimination • Sources of Data Biases in ML Lifecycle • Techniques for Fairness in ML • AI Fairness Tools • Case Studies • Key Takeaways
  13. 13. Algorithmic Bias / Discrimination and broader / related issues
  14. 14. Other Great Tutorials Fairness in Machine Learning Solon Barocas and Moritz Hardt, NeurIPS 2017 Challenges of incorporating algorithmic fairness into practice Henriette Cramer, Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miroslav Dudík, Hanna Wallach, Sravana Reddy, Jean Garcia-Gathright, FAT* 2019 Defining and Designing Fair Algorithms Sam Corbett-Davies, Sharad Goel, ICML 2018 The Trouble with Bias Kate Crawford, NeurIPS 2017 Keynote
  15. 15. [Barocas & Hardt 2017]
  16. 16. "[H]iring could become faster and less expensive, and […] lead recruiters to more highly skilled people who are better matches for their companies. Another potential result: a more diverse workplace. The software relies on data to surface candidates from a wide variety of places and match their skills to the job requirements, free of human biases." Miller (2015) [Barocas & Hardt 2017]
  17. 17. [Barocas & Hardt 2017]
  18. 18. "But software is not free of human influence. Algorithms are written and maintained by people, and machine learning algorithms adjust what they do based on people’s behavior. As a result […] algorithms can reinforce human prejudices." Miller (2015) [Barocas & Hardt 2017]
  19. 19. Do Better Avoid Harm [Cramer et al 2019]
  20. 20. More positive outcomes & avoiding harmful outcomes of algorithms for groups of people [Cramer et al 2019]
  21. 21. [Cramer et al 2019] More positive outcomes & avoiding harmful outcomes of automated systems for groups of people
  22. 22. Legally Recognized Protected Classes Race (Civil Rights Act of 1964); Color (Civil Rights Act of 1964); Sex (Equal Pay Act of 1963; Civil Rights Act of 1964); Religion (Civil Rights Act of 1964);National origin (Civil Rights Act of 1964); Citizenship (Immigration Reform and Control Act); Age (Age Discrimination in Employment Act of 1967);Pregnancy (Pregnancy Discrimination Act); Familial status (Civil Rights Act of 1968); Disability status (Rehabilitation Act of 1973; Americans with Disabilities Act of 1990); Veteran status (Vietnam Era Veterans' Readjustment Assistance Act of 1974; Uniformed Services Employment and Reemployment Rights Act); Genetic information (Genetic Information Nondiscrimination Act) [Boracas & Hardt 2017]
  23. 23. Other Categories Societal Categories i.e., political ideology, language, income, location, topical interests, (sub)culture, physical traits, etc. Intersectional Subpopulations i.e., women from tech Application-specific subpopulations i.e., device type
  24. 24. Types of Harm Harms of allocation withhold opportunity or resources Harms of representation reinforce subordination along the lines of identity, stereotypes [Cramer et al 2019, Shapiro et al., 2017, Kate Crawford, “The Trouble With Bias” keynote N(eur)IPS’17]
  25. 25. Bias, Discrimination & Machine Learning Isn’t bias a technical concept? Selection, sampling, reporting bias, Bias of an estimator, Inductive bias Isn’t discrimination the very point of machine learning? Unjustified basis for differentiation [Barocas & Hardt 2017]
  26. 26. Discrimination is not a general concept It is domain specific Concerned with important opportunities that affect people’s life chances It is feature specific Concerned with socially salient qualities that have served as the basis for unjustified and systematically adverse treatment in the past [Barocas & Hardt 2017]
  27. 27. Regulated Domains Credit (Equal Credit Opportunity Act) Education (Civil Rights Act of 1964; Education Amendments of 1972) Employment (Civil Rights Act of 1964) Housing (Fair Housing Act) ‘Public Accommodation’ (Civil Rights Act of 1964) Extends to marketing and advertising; not limited to final decision [Barocas & Hardt 2017]
  28. 28. Discrimination Law and Legal Terms Treatment Disparate Treatment, Equality of Opportunity, Procedural Fairness Outcome Disparate Impact, Distributive justice, Minimized inequality of outcome
  29. 29. [https://www.reddit.com/r/GCdebatesQT/comments/7qpbpp/food_for_thought_equality_vs_equity_vs_justice/]
  30. 30. Fairness is Political Equal Treatment vs Equal Outcome
  31. 31. Fairness is Political Someone must decide Decisions will depend on the product, company, laws, country, etc.
  32. 32. Why do this? Better product and Serving Broader Population Responsibility and Social Impact Legal and Policy Competitive Advantage and Brand [Boracas & Hardt 2017]
  33. 33. Industry Best Practices for Product Conception, Design, Implementation, and Evolution
  34. 34. Responsible AI is a complex and broad topic
  35. 35. Microsoft's AI Principles
  36. 36. State of the Art in Industry • People: Bring in domain expertise and diversity • Measure: Emphasize analysis and testing • Process: Focus on processes and reviews
  37. 37. Process Best Practices Identify product goals Get the right people in the room Identify stakeholders Select a fairness approach Analyze and evaluate your system Mitigate issues Monitor Continuously and Escalation Plans Auditing and Transparency
  38. 38. Repeat for every new feature, product change, etc.
  39. 39. Google's Responsible Fairness Practices https://ai.google/education/responsible-ai-practices?category=fairness Summary: • Design your product using concrete goals for fairness and inclusion. • Engage with social scientists and other relevant experts. • Set fairness goals • Check system for unfair biases. • Include diverse testers and adversarial/stress testing. • Consider feedback loops • Analyze performance. • Evaluate user experience in real-world scenarios. • Use representative datasets to train and test your model.
  40. 40. Fairness AI systems should treat everyone fairly and avoid affecting similarly situated groups of people in different ways Key considerations 1. Understand the scope, spirit, and potential uses of the AI system 2. Attract a diverse pool of talent 3. Put processes and tools in place to identify bias in datasets and machine learning algorithms 4. Leverage human review and domain expertise 5. Research and employ best practices, analytical techniques, and tools
  41. 41. Sources of Data Biases in ML Lifecycle
  42. 42. Collaborators Much of this section is based on survey paper and tutorial series written by Alexandra Olteanu, Carlos Castillo, Fernando Diaz, Emre Kıcıman
  43. 43. To present a taxonomy of challenges that can occur at different stages of social data analysis. To recognize, understand, or quantify some major classes of limitations around data To give us food for thought, by looking critically at our work. .... Survey goals 43
  44. 44. Design Data Model Application
  45. 45. Design Data Model Application
  46. 46. Design Data Model Application
  47. 47. data bias Data bias: a systematic distortion in data that compromises its use for a task.
  48. 48. Note: Bias must be considered relative to task 48 Gender discrimination is illegal Gender-specific medical diagnosis is desirable
  49. 49. What does data bias look like? Measure systematic distortions along 5 data properties 1. Population Biases 2. Behavioral Biases 3. Content Production Biases 4. Linking Biases 5. Temporal Biases
  50. 50. What does data bias look like? Measure distortions along 5 data properties 1. Population Biases Differences in demographics or other user characteristics between a user population represented in a dataset or platform and a target population 2. Behavioral Biases 3. Content Production Biases 4. Linking Biases 5. Temporal Biases
  51. 51. Example: Different user demographics on different social platforms 51 See [Hargittai’07] for statistics about social media use among young adults according to gender, race and ethnicity, and parental educational background. Figure from http://www.pewinternet.org/2016/11/11/social-media-update-2016/ Example: Different user demographics on different social platforms
  52. 52. Systematic distortions must be evaluated in a task dependent way Gender Shades E.g., for many tasks, populations should match target population, to improve external validity But for some other tasks, subpopulations require approximately equal representation to achieve task parity http://gendershades.org/
  53. 53. What does data bias look like? Measure distortions along 5 data properties 1. Population Biases 2. Behavioral Biases Differences in user behavior across platforms or contexts, or across users represented in different datasets 3. Content Production Biases 4. Linking Biases 5. Temporal Biases
  54. 54. Behavioral Biases from Functional Issues Platform functionality and algorithms influence human behaviors and our observations of human behaviors [Miller et al. ICWSM’16] Figure from: http://grouplens.org/blog/investigating-the-potential-for-miscommunication-using-emoji/
  55. 55. Cultural elements and social contexts are reflected in social datasets 55 Figure from [Hannak et al. CSCW 2017]
  56. 56. Societal biases embedded in behavior can be amplified by algorithms Users pick biased options Biased actions are used as feedback System learns to mimic biased options System presents options, influencing user choice
  57. 57. What does data bias look like? Measure distortions along 5 data properties 1. Population Biases 2. Behavioral Biases 3. Content Production Biases Lexical, syntactic, semantic, and structural differences in the contents generated by users 4. Linking Biases 5. Temporal Biases
  58. 58. Behavioral Biases from Normative Issues Community norms and societal biases influence observed behavior and vary across online and offline communities and contexts What kind of pictures would you share on Facebook, but not on LinkedIn? Are individuals comfortable contradicting popular opinions? E.g., after singer Prince died, most SNs showed public mourning. But not anonymous site PostSecret The same mechanism can embed different meanings in different contexts [Tufekci ICWSM’14] [the meaning of retweets or likes] “could range from affirmation to denunciation to sarcasm to approval to disgust”
  59. 59. Privacy concerns affect what content users share, and, thus, the type of patterns we observe. Foursquare/Image from [Lindqvist et al. CHI’11] The awareness of being observed by other impacts user behavior: Privacy and safety concerns 59
  60. 60. As other media, social media contains misinformation and disinformation 60 Misinformation is false information, unintentionally spread Disinformation is false information, deliberately spread Figures from [Kumar et al. 2016] Hoaxes on Wikipedia: (left) impact as number of views per day for hoaxes surviving at least 7 days, and (right) time until a hoax gets detected and flagged.
  61. 61. What does data bias look like? Measure distortions along 5 data properties 1. Population Biases 2. Behavioral Biases 3. Content Production Biases 4. Linking Biases Differences in the attributes of networks obtained from user connections, interactions, or activity 5. Temporal Biases
  62. 62. Behavior-based and connection-based social links are different 62 Figure from [Wilson et al. EuroSys’09]
  63. 63. Online social networks formation also depends on factors external to the social platforms ● Geography & distance ● Co-visits ● Dynamics of offline relations ● [...] Figure from [Gilbert and Karahalios CHI 2009] 63
  64. 64. What does data bias look like? Measure distortions along 5 data properties 1. Population Biases 2. Behavioral Biases 3. Content Production Biases 4. Linking Biases 5. Temporal Biases Differences in populations and behaviors over time
  65. 65. Different demographics can exhibit different growth rates across and within social platforms 65 TaskRabbit and Fiverr are online freelance marketplaces. Figure from [Hannak et al. CSCW 2017]
  66. 66. E.g., Change in Features over Time Introducing a new feature or changing an existing feature impacts usage patterns on the platform.
  67. 67. Data Collection Biases can come in at any step along the data analysis pipeline ● Metrics: e.g., reliability, lack of domain insights ● Interpretation: e.g., contextual validity, generalizability ● Disclaimers: e.g., lack of negative results and reproducibility ● Functional: biases due to platform affordances and algorithms ● Normative: biases due to community norms ● External: biases due to phenomena outside social platforms ● Non-individuals: e.g., organizations, automated agents ● Acquisition: biases due to, e.g., API limits ● Querying: biases due to, e.g., query formulation ● Filtering: biases due to removal of data “deemed” irrelevant ● Cleaning: biases due to, e.g., default values ● Enrichment: biases from manual or automated annotations ● Aggregation: e.g., grouping, organizing, or structuring data ● Qualitative Analyses: lack generalizability, interpret. biases ● Descriptive Statistics: confounding bias, obfuscated measurements ● Prediction & Inferences: data representation, perform. variations ● Observational studies: peer effects, select. bias, ignorability Data Processing Evaluation Data Analysis Data Source
  68. 68. Design Data Model Application Best Practices for Bias Avoidance/Mitigation
  69. 69. Design Data Model Application Best Practices for Bias Avoidance/Mitigation Consider team composition for diversity of thought, background and experiences
  70. 70. Design Data Model Application Best Practices for Bias Avoidance/Mitigation Understand the task, stakeholders, and potential for errors and harm
  71. 71. Design Data Model Application Best Practices for Bias Avoidance/Mitigation Check data sets Consider data provenance What is the data intended to represent? Verify through qualitative, experimental, survey and other methods
  72. 72. Design Data Model Application Best Practices for Bias Avoidance/Mitigation Check models and validate results Why is the model making decision? What mechanisms would explain results? Is supporting evidence consistent? Twyman’s law: The more unusual the result, more likely it’s an error
  73. 73. Design Data Model Application Best Practices for Bias Avoidance/Mitigation Post-Deployment Ensure optimization and guardrail metrics consistent w/responsible practices and avoid harms Continual monitoring, including customer feedback Have a plan to identify and respond to failures and harms as they occur
  74. 74. Key Takeaways • Many, complex biases at all stages of data collection and analysis • Population, Behavioral, Content Production, Linking, Temporal Biases • Mitigate through deeper investigation, understanding • Read more: Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries, Olteanu, Castillo, Diaz and Kıcıman
  75. 75. Techniques for Fairness in ML
  76. 76. Thanks to Ben Hutchison Alex Beutel (Research Scientist, fairness in ML), Allison Woodruff (UX Research, privacy, fairness and ethics), Andrew Zaldivar (Developer Advocate, ethics and fairness in AI), Hallie Benjamin (Senior Strategist, ethics and fairness in ML), Jamaal Barnes (Program Manager, fairness in ML), Josh Lovejoy (UX Designer, People and AI Research; now Microsoft), Margaret Mitchell (Research Scientist, ethics and fairness in AI), Rebecca White (Program Manager, fairness in ML) and others!
  77. 77. Techniques for Fairness in ML Product Introspection Practical Testing Training Data Modeling UI/Product Design
  78. 78. Product Introspection
  79. 79. Product Introspection (1): Make Your Key Choices Explicit [Mitchell et al., 2018] Goals Decision Prediction Profit from loans Whether to lend Loan will be repaid Justice, Public safety Whether to detain Crime committed if not detained • Goals are ideally measurable • What are your non-goals? • Which decisions are you not considering? • What is the relationship between Prediction and Decision?
  80. 80. Product Introspection (2): Identify Potential Harms • What are the potential harms? • Applicants who would have repaid are not given loans • Convicts who would not commit a crime are locked up. • Are there also longer term harms? • Applicants are given loans, then go on to default, harming their credit score • Are some harms especially bad?
  81. 81. Seek out Diverse Perspectives • Fairness Experts • User Researchers • Privacy Experts • Legal • Social Science Backgrounds • Diverse Identities • Gender • Sexual Orientation • Race • Nationality • Religion
  82. 82. Practical Testing
  83. 83. Launch with Confidence: Testing for Bias • How will you know if users are being harmed? • How will you know if harms are unfairly distributed? • Detailed testing practices are often not covered in academic papers • Discussing testing requirements is a useful focal point for cross-functional teams
  84. 84. Model Predictions Evaluate for Inclusion - Confusion Matrix
  85. 85. Model Predictions Positive Negative Evaluate for Inclusion - Confusion Matrix
  86. 86. Model Predictions Positive Negative ● Exists ● Predicted True Positives ● Doesn’t exist ● Not predicted True Negatives Evaluate for Inclusion - Confusion Matrix
  87. 87. Model Predictions Positive Negative ● Exists ● Predicted True Positives ● Exists ● Not predicted False Negatives ● Doesn’t exist ● Predicted False Positives ● Doesn’t exist ● Not predicted True Negatives Evaluate for Inclusion - Confusion Matrix
  88. 88. Efficient Testing for Bias • Development teams are under multiple constraints • Time • Money • Human resources • Access to data • How can we efficiently test for bias? • Prioritization • Strategic testing
  89. 89. Choose your evaluation metrics in light of acceptable tradeoffs between False Positives and False Negatives
  90. 90. Privacy in Images False Positive: Something that doesn’t need to be blurred gets blurred. Can be a bummer. False Negative: Something that needs to be blurred is not blurred. Identity theft. False Positives Might be Better than False Negatives
  91. 91. Spam Filtering False Negative: Email that is SPAM is not caught, so you see it in your inbox. Usually just a bit annoying. False Positive: Email flagged as SPAM is removed from your inbox. If it’s from a friend or loved one, it’s a loss! False Negatives Might Be Better than False Positives
  92. 92. Types of Practical Fairness Testing 1. Targeted Tests 2. Quick Tests 3. Comprehensive Tests 4. Ecologically Valid Tests 5. Adversarial Testing
  93. 93. 1. Targeted Tests Based on prior experience/knowledge • Computer Vision ⇒ Test for dark skin • Natural Language Processing ⇒ Test for gender stereotypes Cf. smoke tests (non-exhaustive tests that check that most important functions work)
  94. 94. Targeted Testing of a Gender Classifier [Joy Buolamwini & Timnit Gebru, 2018] • Facial recognition software: Higher accuracy for light skinned men • Error rates for dark skinned women: 20% - 34%
  95. 95. 2. Quick Tests • "Cheap" • Useful throughout product cycle • Spot check extreme cases • Low coverage but high informativity • Need to be designed thoughtfully, e.g. • World knowledge • Prior product failures
  96. 96. Quick Tests for Gender in Translate
  97. 97. 3. Comprehensive Tests Include sufficient data for each subgroup • May include relevant combinations of attributes • Sometimes synthetic data is appropriate Particularly important if model will be used in larger system Cf. Unit tests (verify correct outputs for wide range of correct inputs)
  98. 98. Comprehensive Testing of a Toxic Language Detector [Dixon et al., 2018]ConversationAI
  99. 99. Comprehensive Testing of a Toxic Language Detector [Dixon et al., 2018]Problem: A False Positive Bias
  100. 100. Comprehensive Testing of a Toxic Language Detector [Dixon et al., 2018]
  101. 101. AUC Metrics for Comprehensive Testing • Subgroup AUC: • Subgroup Positives vs Subgroup Negatives • "BPSN" AUC: • Background Positives vs Subgroup Negatives • "BNSP" AUC: • Background Negatives vs Subgroup Positives
  102. 102. Comprehensive Testing of a Toxicity Detector https://github.com/conversationai/perspectiveapi/blob/master/model_cards/English/toxicity.md
  103. 103. 4. Ecologically Valid Testing Data is drawn from a distribution representative of the deployment distribution • Goal is NOT to be representative of the training distribution • (When appropriate) Condition on labels & certainty Example usage scenarios : • Continuous monitoring • You have historical product usage data • You can estimate user distribution reasonably well
  104. 104. Ecologically Valid Testing: Distributions Matter What is being compared? Over what data?
  105. 105. Challenges with Ecologically Valid Testing • Post-deployment distributions may not be known • Product may not be launched yet! • Sensitive attributes often not available in deployment • User distributions may change • We may want user distributions to change • e.g., broaden user base
  106. 106. 5. Adversarial Tests Search for rare but extreme harms • “Poison needle in haystack” • Requires knowledge of society Typical usage scenario: • Close to launch
  107. 107. Hypothetical Example of Adversarial Testing • Emoji autosuggest: are happy emoji suggested for sad sentences? My dog has gone to heaven Suggest: Input: 😊
  108. 108. Summary of Practical Fairness Testing 1. Targeted Tests: domain specific (image, language, etc) 2. Quick Tests: cheap tests throughout dev cycle 3. Comprehensive Tests: thorough 4. Ecologically Valid Tests: real-world data 5. Adversarial Testing: find poison needles
  109. 109. Fairness Testing Practices are Good ML Practices • Confidence in your product's fairness requires fairness testing • Fairness testing has a role throughout the product iteration lifecycle • Contextual concerns should be used to prioritize fairness testing
  110. 110. Training Data
  111. 111. Fairness-aware Data Collection [Holstein et al., 2019] • ML literature generally assumes data is fixed • Often the solution is more and/or better training data But: need to be Thoughtful! When might more Data not Help? • If your data sampling techniques are biased • Fundamental problems in data quality [Eckhouse et al., 2018] • What does your data really represent? E.g. crimes vs arrests • Recall: Product Introspection: How do Predictions relate to Decisions?
  112. 112. Get to Know Your Training Data: Facets Dive
  113. 113. Datasheets for Datasets [Gebru et al., 2018]
  114. 114. Fairness-Aware Data Collection Techniques 1. Address population biases • Target under-represented (with respect to the user population) groups 2. Address representation issues • Oversample from minority groups • Sufficient data from each group may be required to avoid model treating them as "outliers" 3. Data augmentation: synthesize data for minority groups • E.g. from observed "he is a doctor" → synthesize "she is a doctor" 4. Fairness-aware active learning • Collect more data for group with highest error rates
  115. 115. Sometimes data biases are unavoidable Solution: ML Techniques
  116. 116. Modeling
  117. 117. Practical Concerns with Fair Machine Learning •Is the training process stable? •Can we guarantee that fairness policies will be satisfied? • Cf. Legal requirements in education, employment, finance
  118. 118. Machine Learning Techniques: Adversarial Training? P(Label=1) P(Group) Negative Gradient Fairly well-studied with some nice theoretical guarantees. But can be difficult to train. Features, Label, Group
  119. 119. Machine Learning: Correlation Loss [Beutel et al., 2018] Motivation: Overcome training instability with adversarial training Key idea: include fairness objective in the loss function
  120. 120. Predicted P(Target) distribution for “Blue” and “Red” examples (Illustrative Example) min Loss(Label, Pred) Pred = P(Label=1) Features, Label, Group Machine Learning Techniques: Correlation Loss
  121. 121. min Loss(Label, Pred) + Abs(Corr(Pred, Group))|Label=0 Pred = P(Label=1) Predicted P(Target) distribution for “Blue” and “Red” examples (Illustrative Example) Features, Label, Group Machine Learning Techniques: Correlation Loss
  122. 122. ● Computed per batch ● Easy to use ● More stable than adversarial training. min Loss(Label, Pred) + Abs(Corr(Pred, Group))|Label=0 Pred = P(Label=1) Features, Label, Group Machine Learning Techniques: Correlation Loss
  123. 123. Machine Learning: Constrained Optimization [Cotter et al., 2018] Motivation: Can we ensure that fairness policies are satisfied? • Fairness goals are explicitly stated as constraints on predictions, e.g. • FPR on group 1 <= 0.8 * FPR on group 2 • Machine learner optimizes objective function subject to the constraints
  124. 124. Model Cards for Model Reporting[Mitchell et al., 2018]
  125. 125. Further Machine Learning Techniques Many more approaches are linked to from the tutorial website.
  126. 126. UI/Product Design
  127. 127. Fairness in UI/Product Design 1. Robust UIs handle ML failures gracefully 2. UIs should empower users
  128. 128. AI Fairness and Transparency Tools 10:00am
  129. 129. Overview of Transparency and Fairness Tools Bias Detection Bias Mitigation Responsible Metadata Microsoft InterpretML Microsoft Azure Interpretability Toolkit IBM Open Scale Datasheets for datasets Model Cards IBM Fairness 360 Microsoft Fairlearn Google What-if H2O Fact Sheets
  130. 130. Overview of Transparency and Fairness Tools Bias Detection Bias Mitigation Responsible Metadata Microsoft InterpretML Microsoft Azure Interpretability Toolkit IBM Open Scale Datasheets for datasets Model Cards IBM Fairness 360 Microsoft Fairlearn Google What-if H2O Fact Sheets
  131. 131. Machine Learning Transparency and Fairness
  132. 132. InterpretML github.com/Microsoft/interpret pip install –U interpret
  133. 133. InterpretML Goal: to provide researchers and AI developers with a toolkit that allows for: • Explaining machine learning models globally on all data, or locally on a specific data point using the state-of-art technologies • Easily adding new explainers and compare them to the state-of- the-art explainers • A common API and data structure across the integrated libraries github.com/Microsoft/interpret pip install –U interpret
  134. 134. Azure Machine Learning Interpretability Toolkit
  135. 135. • Training Time Input: model + training data • Any models that are trained on datasets in Python `numpy.array`, `pandas.DataFrame`, `iml.datatypes.DenseData`, or `scipy.sparse.csr_matrix` format • Accepts both models and pipelines as input. • Model: model must implement the prediction function `predict` or `predict_proba` that conforms to the Scikit convention. • Pipeline: the explanation function assumes that the running pipeline script returns a prediction. • Inferencing Time Input: test data Azure Machine Learning Interpretability Toolkit
  136. 136. • Interpretability at training time • Combination of glass-box models and black-box explainers • Auto reason code generation for local predictions • Ability to cross reference to other techniques to ensure stability and consistency in results H2O
  137. 137. IBM Open Scale Goal: to provide AI operations team with a toolkit that allows for: • Monitoring and re-evaluating machine learning models after deployment
  138. 138. IBM Open Scale Goal: to provide AI operations team with a toolkit that allows for: • Monitoring and re-evaluating machine learning models after deployment • ACCURACY • FAIRNESS • PERFORMANCE
  139. 139. IBM Open Scale
  140. 140. IBM Open Scale
  141. 141. IBM Open Scale Fairness Accuracy Performance
  142. 142. IBM Open Scale
  143. 143. Overview of Transparency and Fairness Tools Bias Detection Bias Mitigation Responsible Metadata Microsoft InterpretML Microsoft Azure Interpretability Toolkit IBM Open Scale Datasheets for datasets Model Cards IBM Fairness 360 Microsoft Fairlearn Google What-if H2O Fact Sheets
  144. 144. What If Tool Goal: Code-free probing of machine learning models • Feature perturbations (what if scenarios) • Counterfactual example analysis • [Classification] Explore the effects of different classification thresholds, taking into account constraints such as different numerical fairness metrics.
  145. 145. What If Tool
  146. 146. IBM Fairness 360 Datasets Toolbox Fairness metrics (30+) Bias mitigation algorithms (9+) Guidance Industry-specific tutorials
  147. 147. IBM Fairness 360 Datasets Toolbox Fairness metrics (30+) Bias mitigation algorithms (9+) Guidance Industry-specific tutorials
  148. 148. IBM Fairness 360 Example: Datasets Toolbox Fairness metrics (30+) Bias mitigation algorithms (9+) Guidance Industry-specific tutorials
  149. 149. IBM Fairness 360 Example: Datasets Toolbox Fairness metrics (30+) Bias mitigation algorithms (9+) Guidance Industry-specific tutorials
  150. 150. IBM Fairness 360 Datasets Toolbox Fairness metrics (30+) Bias mitigation algorithms (9+) Guidance Industry-specific tutorials
  151. 151. IBM Fairness 360 Datasets Toolbox Fairness metrics (30+) Bias mitigation algorithms (9+) Guidance Industry-specific tutorials Pre-processing algorithm: a bias mitigation algorithm that is applied to training data In-processing algorithm: a bias mitigation algorithm that is applied to a model during its training Post-processing algorithm: a bias mitigation algorithm that is applied to predicted labels
  152. 152. IBM Fairness 360 Datasets Toolbox Fairness metrics (30+) Bias mitigation algorithms (9+) Guidance Industry-specific tutorials
  153. 153. • The toolkit should only be used in a very limited setting: allocation or risk assessment problems with well-defined protected attributes in which one would like to have some sort of statistical or mathematical notion of sameness • The metrics and algorithms clearly do not capture the full scope of fairness in all situations • Only a starting point to a broader discussion among multiple stakeholders on overall decision-making workflows Appropriateness of AIF360 @IBM Research’19
  154. 154. Appropriateness of AIF360 • The toolkit should only be used in a very limited setting: allocation or risk assessment problems with well-defined protected attributes in which one would like to have some sort of statistical or mathematical notion of sameness • The metrics and algorithms clearly do not capture the full scope of fairness in all situations • Only a starting point to a broader discussion among multiple stakeholders on overall decision-making workflows @IBM Research’19
  155. 155. Appropriateness of AIF360 • The toolkit should only be used in a very limited setting: allocation or risk assessment problems with well-defined protected attributes in which one would like to have some sort of statistical or mathematical notion of sameness • The metrics and algorithms clearly do not capture the full scope of fairness in all situations • Only a starting point to a broader discussion among multiple stakeholders on overall decision-making workflows @IBM Research’19
  156. 156. Microsoft Research Fairlearn Wrapper around any classification/regression algorithm q easily integrated into existing ML systems q Doesn’t require test-time access to protected attribute Versatile: q many measures of fairness q multiple protected attributes with many values
  157. 157. • Define fairness metric w/r/t/ protective attribute(s) • ML goal becomes minimizing classification/regression error while minimizing unfairness according to the metric Challenges: 1. Defining an appropriate fairness metric 2. Learning an accurate model subject to the metric "Fair" Classification/Regression
  158. 158. Microsoft Research Fairlearn Goal: find a classifier/regressor [in some family] that minimizes classification/regression error subject to fairness constraints (user-defined fairness metric) Given: a standard ML algorithm as a black box Approach: iteratively call black box and reweight (and possibly relabel) the data
  159. 159. Microsoft Research Fairlearn
  160. 160. Overview of Transparency and Fairness Tools Bias Detection Bias Mitigation Responsible Metadata Microsoft InterpretML Microsoft Azure Interpretability Toolkit IBM Open Scale Datasheets for datasets Model Cards IBM Fairness 360 Microsoft Fairlearn Google What-if H2O Fact Sheets
  161. 161. Datasheets for Datasets [Gebru et al., 2018] • Better data-related documentation • Datasheets for datasets: every dataset, model, or pre-trained API should be accompanied by a data sheet that documents its • Creation • Intended uses • Limitations • Maintenance • Legal and ethical considerations • Etc.
  162. 162. Model Cards for Model Reporting[Mitchell et al., 2018]
  163. 163. Fact Sheets [Arnold et al., 2019] • Is distinguished from “model cards” and “datasheets” in that the focus is on the final AI service: • What is the intended use of the service output? • What algorithms or techniques does this service implement? • Which datasets was the service tested on? (Provide links to datasets that were used for testing, along with corresponding datasheets.) • Describe the testing methodology. • Describe the test results. • Etc.
  164. 164. Overview of Transparency and Fairness Tools Bias Detection Bias Mitigation Responsible Metadata Microsoft InterpretML Microsoft Azure Interpretability Toolkit IBM Open Scale Datasheets for datasets Model Cards IBM Fairness 360 Microsoft Fairlearn Google What-if H2O Fact Sheets
  165. 165. Fairness Methods in Practice (Case Studies)
  166. 166. Deep Dive: Talent Search Fairness in AI @ LinkedIn
  167. 167. Create economic opportunity for every member of the global workforce LinkedIn’s Vision Connect the world's professionals to make them more productive and successful LinkedIn’s Mission
  168. 168. 630M Members 30M Companies 20M Jobs 35K Skills 90K Schools 100B+ Updates viewed LinkedIn Economic Graph
  169. 169. AI @LinkedIn 25 B ML A/B experiments per week data processed offline per day 2002.15 PB data processed nearline per day 2 PB Scale graph edges with 1B nodes 53 B parameters in ML models
  170. 170. Guiding Principle: “Diversity by Design”
  171. 171. “Diversity by Design” in LinkedIn’s Talent Solutions Insights to Identify Diverse Talent Pools Representative Talent Search Results Diversity Learning Curriculum
  172. 172. Plan for Diversity
  173. 173. Plan for Diversity
  174. 174. Identify Diverse Talent Pools
  175. 175. Inclusive Job Descriptions / Recruiter Outreach
  176. 176. Representative Ranking for Talent Search S. C. Geyik, S. Ambler, K. Kenthapadi, Fairness- Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search, KDD’19. [Microsoft’s AI/ML conference (MLADS’18). Distinguished Contribution Award] Building Representative Talent Search at LinkedIn (LinkedIn engineering blog)
  177. 177. Intuition for Measuring Representativeness • Ideal: Top ranked results should follow a desired distribution on gender/age/… • E.g., same distribution as the underlying talent pool • Inspired by “Equal Opportunity” definition [Hardt et al, NIPS’16]
  178. 178. Desired Proportions within the Attribute of Interest • Compute the proportions of the values of the attribute (e.g., gender, gender-age combination) amongst the set of qualified candidates • “Qualified candidates” = Set of candidates that match the search query criteria • Retrieved by LinkedIn’s Galene search engine • Desired proportions could also be obtained based on legal mandate / voluntary commitment
  179. 179. Measuring (Lack of) Representativeness • Skew@k • (Logarithmic) ratio of the proportion of candidates having a given attribute value among the top k ranked results to the corresponding desired proportion • Variants: • MinSkew: Minimum over all attribute values • MaxSkew: Maximum over all attribute values • Normalized Discounted Cumulative Skew • Normalized Discounted Cumulative KL-divergence
  180. 180. Fairness-aware Reranking Algorithm (Simplified) • Partition the set of potential candidates into different buckets for each attribute value • Rank the candidates in each bucket according to the scores assigned by the machine-learned model • Merge the ranked lists, balancing the representation requirements and the selection of highest scored candidates • Algorithmic variants based on how we choose the next attribute
  181. 181. Architecture
  182. 182. Validating Our Approach • Gender Representativeness • Over 95% of all searches are representative compared to the qualified population of the search • Business Metrics • A/B test over LinkedIn Recruiter users for two weeks • No significant change in business metrics (e.g., # InMails sent or accepted) • Ramped to 100% of LinkedIn Recruiter users worldwide
  183. 183. Lessons learned • Post-processing approach desirable • Model agnostic • Scalable across different model choices for our application • Acts as a “fail-safe” • Robust to application-specific business logic • Easier to incorporate as part of existing systems • Build a stand-alone service or component for post-processing • No significant modifications to the existing components • Complementary to efforts to reduce bias from training data & during model training
  184. 184. Acknowledgements •Team: • AI/ML: Sahin Cem Geyik, Stuart Ambler, Krishnaram Kenthapadi • Application Engineering: Gurwinder Gulati, Chenhui Zhai • Analytics: Patrick Driscoll, Divyakumar Menghani • Product: Rachel Kumar •Acknowledgements • Deepak Agarwal, Erik Buchanan, Patrick Cheung, Gil Cottle, Nadia Fawaz, Rob Hallman, Joshua Hartman, Sara Harrington, Heloise Logan, Stephen Lynch, Lei Ni, Igor Perisic, Ram Swaminathan, Ketan Thakkar, Janardhanan Vembunarayanan, Hinkmond Wong, Lin Yang, Liang Zhang, Yani Zhang
  185. 185. Reflections • Lessons from fairness challenges è Need “Fairness by Design” approach when building AI products • Case studies on fairness-aware ML in practice • Collaboration/consensus across key stakeholders (product, legal, PR, engineering, AI, …)
  186. 186. Conversational Agents
  187. 187. Responsible AI and Conversational Agents
  188. 188. Background: Conversational Agents • Social bots • Informational bots • Task-oriented bots Eliza, Joseph Weizenbaum (MIT) 1964
  189. 189. Design Data Model Application
  190. 190. Design Data Model Application
  191. 191. Design Data Model Application Questions to ask during AI Design Who is affected by AI? How might AI cause harms?
  192. 192. Stakeholders: Who might be affected? 1. Humans speaking with the agent • Emotional harms, misinformation, threaten task completion 2. The agent “owner” • Harm practices and reputation of the owner 3. Third-party individuals and groups • People mentioned in conversations! 4. Audiences listening to the conversation • This may include general public!
  193. 193. How might AI cause harms? Functional Harms • Misrepresentation of capabilities • Misinforming user about task status • Misunderstanding user and doing the wrong task • Revealing private information inappropriately Yes, I can do that! Can you order sushi for me? Great, one California roll please I don’t understand. ?!?!
  194. 194. How might AI cause harms? Functional Harms Functional Harms • Misrepresentation of capabilities • Misinforming user about task status • Misunderstanding user and doing the wrong task • Revealing private information inappropriately In just 1 minute When will my order arrive? Where’s my order? Arriving in 1 minute ?!?!
  195. 195. How might AI cause harms? Functional Harms Functional Harms • Misrepresentation of capabilities • Misinforming user about task status • Misunderstanding user and doing the wrong task • Revealing private information inappropriately Ordering egg yolks Tell me a joke ?!?!
  196. 196. How might AI cause harms? Functional Harms Functional Harms • Misrepresentation of capabilities • Misinforming user about task status • Misunderstanding user and doing the wrong task • Revealing private information inappropriately Bob Smith’s CC number is … What’s Bob Smith’s number? ?!?!
  197. 197. How might AI cause harms? Functional Harms • Misrepresentation of capabilities • Misinforming user about task status • Misunderstanding user and doing the wrong task • Revealing private information inappropriately These harms are even more problematic when they systematically occur for some groups of people but not others
  198. 198. How might AI cause harms? Social Harms: Harms to Individuals • Inciting/encouraging harmful behavior • Self/harm, suicide • Violence or harassment against others • Discouraging good behavior, e.g., visiting doctors • Providing wrong information • Medical, financial, legal advice • Verbal harassment • Bullying, sexual harassment
  199. 199. How might AI cause harms? Social Harms: Harms to Communities • Promoting violence, war, ethnic cleansing, … • Including promoting related organizations and philosophies • Engaging in hate speech, disparagement, mocking, … • Including inadvertent, or Inappropriate imitation (dialect, accent,…) • Disruption to social processes • Election disruption, fake news, false disaster response, …
  200. 200. Why is this hard? Language is ambiguous, complex, with social context Examples of complex failures: • Failure to deflect/terminate contentious topics • Refusing to discuss when disapproval would be better • Polite agreement with unrecognized bias Yes, I can do that! Let’s talk about <something evil>
  201. 201. Why is this hard? Language is ambiguous, complex, with social context Examples of complex failures: • Failure to deflect/terminate contentious topics • Polite agreement with unrecognized bias • Refusing to discuss when disapproval would be better Sounds ok. Men are better at <whatever> than women
  202. 202. Why is this hard? Language is ambiguous, complex, with social context Examples of complex failures: • Failure to deflect/terminate contentious topics • Polite agreement with unrecognized bias • Refusing to discuss when disapproval would be better I don’t like talking about religion I was bullied at school because I’m muslim
  203. 203. Design Data Model Application
  204. 204. Implications for data collection Common data sources • Hand-written rules • Existing conversational data (e.g., social media) • New online conversations (e.g., from new customer interactions) Cleaning training data • For anonymization • E.g., remove individual names. But keep famous names (fictional characters, celebrities, politicians, …) • Ensure adheres to social norms • Not enough to filter individual words: Filter “I hate [X]”, and you’ll miss “I’m not a fan of [X]. • Remember meanings change with context • Differentiate between bot input and bot output in training data • Remove offensive text from bot output training • But don’t remove from bot inputs à allow learning of good responses to bad inputs
  205. 205. Design Data Model Application
  206. 206. Design Data Model Application Responsible bots: 10 guidelines for developers of conversational AI 1. Articulate the purpose of your bot 2. Be transparent that you use bots 3. Elevate to a human when needed 4. Design bot to respect cultural norms 5. Ensure bot is reliable (metrics, feedback) 6. Ensure your bot treats people fairly 7. Ensure your bot respects privacy 8. Ensure your bot handles data securely 9. Ensure your bot is accessible 10.Accept responsibility https://www.microsoft.com/en-us/research/publication/responsible-bots/
  207. 207. Design Data Model Application Responsible bots: 10 guidelines for developers of conversational AI 1. Articulate the purpose of your bot 2. Be transparent that you use bots 3. Elevate to a human when needed 4. Design bot to respect cultural norms 5. Ensure bot is reliable (metrics, feedback) 6. Ensure your bot treats people fairly 7. Ensure your bot respects privacy 8. Ensure your bot handles data securely 9. Ensure your bot is accessible 10.Accept responsibility https://www.microsoft.com/en-us/research/publication/responsible-bots/
  208. 208. Design Data Model Application Responsible bots: 10 guidelines for developers of conversational AI 1. Articulate the purpose of your bot 2. Be transparent that you use bots 3. Elevate to a human when needed 4. Design bot to respect cultural norms 5. Ensure bot is reliable (metrics, feedback) 6. Ensure your bot treats people fairly 7. Ensure your bot respects privacy 8. Ensure your bot handles data securely 9. Ensure your bot is accessible 10.Accept responsibility https://www.microsoft.com/en-us/research/publication/responsible-bots/
  209. 209. Key take-away points • Many stakeholders affected by conversational agent AIs • Not only people directly interacting with AI, but also indirectly affected • Many potential functional, social harms to individuals, communities • Functional harms exacerbated when systematically biased against groups • Challenges include complexity and ambiguity of natural language • Avoiding these harms requires careful consideration across the entire AI lifecycle.
  210. 210. Acknowledgments • Chris Brockett, Bill Dolan, Michel Galley, Ece Kamar
  211. 211. Google Assistant
  212. 212. Google Assistant Key Points: • Think about user harms How does your product make people feel • Adversarial ("stress") testing for all Google Assistant launches • People might say racist, sexist, homophobic stuff • Diverse testers • Think about expanding who your users could and should be • Consider the diversity of your users
  213. 213. Computer Vision
  214. 214. Google Camera Key points: • Check for unconscious bias • Comprehensive testing: "make sure this works for everybody"
  215. 215. Night Sight
  216. 216. This is a “Shirley Card” Named after a Kodak studio model named Shirley Page, they were the primary method for calibrating color when processing film. SKIN TONE IN PHOTOGRAPHY SOURCES Color film was built for white people. Here's what it did to dark skin. (Vox) How Kodak's Shirley Cards Set Photography's Skin-Tone Standard, NPR
  217. 217. Until about 1990, virtually all Shirley Cards featured Caucasian women. SKIN TONE IN PHOTOGRAPHY SOURCES Color film was built for white people. Here's what it did to dark skin. (Vox) Colour Balance, Image Technologies, and Cognitive Equity, Roth How Photography Was Optimized for White Skin Color (Priceonomics)
  218. 218. As a result, photos featuring people with light skin looked fairly accurate. SKIN TONE IN PHOTOGRAPHY SOURCES Color film was built for white people. Here's what it did to dark skin. (Vox) Colour Balance, Image Technologies, and Cognitive Equity, Roth How Photography Was Optimized for White Skin Color (Priceonomics) Film Kodachrome Year 1970 Credit Darren Davis, Flickr
  219. 219. Photos featuring people with darker skin, not so much... SKIN TONE IN PHOTOGRAPHY SOURCES Color film was built for white people. Here's what it did to dark skin. (Vox) Colour Balance, Image Technologies, and Cognitive Equity, Roth How Photography Was Optimized for White Skin Color (Priceonomics) Film Kodachrome Year 1958 Credit Peter Roome, Flickr
  220. 220. Google Clips
  221. 221. Google Clips "We created controlled datasets by sampling subjects from different genders and skin tones in a balanced manner, while keeping variables like content type, duration, and environmental conditions constant. We then used this dataset to test that our algorithms had similar performance when applied to different groups." https://ai.googleblog.com/2018/05/automat ic-photography-with-google-clips.html
  222. 222. Geena Davis Inclusion Quotient [with Geena Davis Institute on Gender in Media]
  223. 223. Machine Translation
  224. 224. (Historical) Gender Pronouns in Translate
  225. 225. Three Step Approach
  226. 226. 1. Detect Gender-Neutral Queries Train a text classifier to detect when a Turkish query is gender-neutral. • trained on thousands of human-rated Turkish examples
  227. 227. 2. Generate Gender-Specific Translations • Training: Modify training data to add an additional input token specifying the required gender: • (<2MALE> O bir doktor, He is a doctor) • (<2FEMALE> O bir doktor, She is a doctor) • Deployment: If step (1) predicted query is gender-neutral, add male and female tokens to query • O bir doktor -> {<2MALE> O bir doktor, <2FEMALE> O bir doktor}
  228. 228. 3. Check for Accuracy Verify: 1. If the requested feminine translation is feminine. 2. If the requested masculine translation is masculine. 3. If the feminine and masculine translations are exactly equivalent with the exception of gender-related changes.
  229. 229. Result: Reduced Gender Bias in Translate
  230. 230. Smart Compose
  231. 231. Adversarial Testing for Smart Compose in Gmail
  232. 232. Adversarial Testing for Smart Compose in Gmail
  233. 233. Adversarial Testing for Smart Compose in Gmail
  234. 234. Adversarial Testing for Smart Compose in Gmail
  235. 235. Key Takeaways
  236. 236. Good ML Practices Go a Long Way Lots of low hanging fruit in terms of improving fairness simply by using machine learning best practices • Representative data • Introspection tools • Visualization tools • Testing 01 Fairness improvements often lead to overall improvements • It’s a common misconception that it’s always a tradeoff 02
  237. 237. Breadth and Depth Required Looking End-to-End is critical • Need to be aware of bias and potential problems at every stage of product and ML pipelines (from design, data gathering, … to deployment and monitoring) 01 Details Matter • Slight changes in features or labeler criteria can change the outcome • Must have experts who understand the effects of decisions • Many details are not technical such as how labelers are hired 02
  238. 238. Process Best Practices Identify product goals Get the right people in the room Identify stakeholders Select a fairness approach Analyze and evaluate your system Mitigate issues Monitor Continuously and Escalation Plans Auditing and Transparency Policy Technology
  239. 239. Beyond Accuracy Performance Cost Fairness and Bias Privacy Security Safety Robustness
  240. 240. The Real World is What Matters Decisions should be made considering the real world goals and outcomes You must have people involved that understand these real world effects •Social scientist, Lawyers, domain experts… •Hire experts (even ones that don’t code) You need different types of testing depending on the application We need more research focused on people, applications, and real world effects A lot of the current research is not that useful in practice We need more social science + machine learning research
  241. 241. Key Open Problems in Applied Fairness
  242. 242. Key Open Problems in Applied Fairness What if you don’t have the sensitive attributes? When should you use what approach? For example, Equal treatment vs equal outcome? How to identify harms? Process for framing AI problems: Will the chosen metrics lead to desired results? How to tell if data generation and collection method is appropriate for a task? (e.g., causal structure analysis?) Processes for mitigating harms and misbehaviors quickly
  243. 243. Related Tutorials / Resources • Sara Hajian, Francesco Bonchi, and Carlos Castillo, Algorithmic bias: From discrimination discovery to fairness-aware data mining, KDD Tutorial, 2016. • Solon Barocas and Moritz Hardt, Fairness in machine learning, NeurIPS Tutorial, 2017. • Kate Crawford, The Trouble with Bias, NeurIPS Keynote, 2017. • Arvind Narayanan, 21 fairness definitions and their politics, FAT* Tutorial, 2018. • Sam Corbett-Davies and Sharad Goel, Defining and Designing Fair Algorithms, Tutorials at EC 2018 and ICML 2018. • Ben Hutchinson and Margaret Mitchell, Translation Tutorial: A History of Quantitative Fairness in Testing, FAT* Tutorial, 2019. • Henriette Cramer, Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miroslav Dudík, Hanna Wallach, Sravana Reddy, and Jean Garcia-Gathright, Translation Tutorial: Challenges of incorporating algorithmic fairness into industry practice, FAT* Tutorial, 2019. • ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*)
  244. 244. Fairness Privacy Transparency Explainability Related KDD’19 sessions: 1.Tutorial: Explainable AI in Industry (Sun, 1-5pm) 2.Workshop: Explainable AI/ML (XAI) for Accountability, Fairness, and Transparency (Mon) 3.Social Impact Workshop (Wed, 8:15 – 11:45) 4.Keynote: Cynthia Rudin, Do Simpler Models Exist and How Can We Find Them? (Thu, 8-9am) 5.Research Track Session RT17: Interpretability (Thu, 10-12) 6.Several papers on fairness (e.g., ADS7 (Thu, 10-12), ADS9 (Thu, 1:30-3:30))
  245. 245. Thanks! Questions? •Tutorial website: https://sites.google.com/view/kdd19- fairness-tutorial •Feedback most welcome J • slbird@microsoft.com, benhutch@google.com, kkenthapadi@linkedin.com, emrek@microsoft.com, mmitchellai@google.com

×