Evan Estola, Lead Machine Learning Engineer, Meetup, at MLconf NYC 2017

•

2 j'aime•844 vues

Evan is a Lead Machine Learning Engineer working on the Data Team at Meetup. Combining product design, machine learning research and software engineering, Evan builds systems that help Meetup’s members find the best thing in the world: real local community. Before Meetup, Evan worked on hotel recommendations at Orbitz Worldwide, and he began his career in the Information Retrieval Lab at the Illinois Institute of Technology. Abstract Summary: Machine Learning Heresy and the Church of Optimality: As Machine Learning continues to grow in both usage and impact on people’s lives, there has been a growing concern around the ethics of using these systems. In application areas such as hiring selection, loan review, and even prison sentencing, ML is being used in ways that raise questions about the fairness of these algorithms. But what does it mean for an algorithm to be fair? An algorithm will consistently make the same decision when given the same data, leading some people to argue that building an optimal algorithm is inherently fair. Even in the case of using sensitive features like age, race and gender, if the data is predictive, aren’t we just modeling reality? In this talk, I will argue that these questions do not let us off the hook in regards to the impact of the systems we build as Machine Learning engineers. I think it is important to question the nature of how ‘optimal’ a model can even be in the first place. Finally, I will discuss what kinds of organizational resistance engineers might run into, and how to deal with questionable ethical decisions for the sake of being ‘optimal’.

Technologie

Machine Learning Heresy
and the
Church of Optimality
Evan Estola
MLconf
3/24/17

About Me
● Evan Estola
● Staff Machine Learning Engineer, Data Team Lead @ Meetup
● evan@meetup.com
● @estola

Meetup
● Do more of what’s most important
to you
● 270,000 Meetups, ~30 million
members
● Recommendations
○ Cold Start
○ Sparsity
○ Lies

Data
Science
impacts
lives
● Ads you see
● Friend’s Activity/Facebook feed
● News you’re exposed to
● If a product is available
● If you can get a ride
● Price you pay for things
● Admittance into college
● If you can get a loan
● Job openings you find
● Job openings you can get
● Punishment for crime

You just wanted a
kitchen scale, now
Amazon thinks you’re
a drug dealer

● “Black-sounding” names 25% more
likely to be served ad suggesting
criminal record

●
● Fake profiles, track ads
● Career coaching for “200k+”
Executive jobs Ad
● Male group: 1852 impressions
● Female group: 318

● Twitter bot
● “Garbage in,
garbage out”
● Responsibility?
“In the span of 15 hours Tay referred to feminism as a
"cult" and a "cancer," as well as noting "gender equality
= feminism" and "i love feminism now." Tweeting
"Bruce Jenner" at the bot got similar mixed response,
ranging from "caitlyn jenner is a hero & is a stunning,
beautiful woman!" to the transphobic "caitlyn jenner
isn't a real woman yet she won woman of the year?"”
Tay.ai

You know racist computers are a
bad idea
Don’t let your company invent
racist computers
@estola

Brief Math Aside
● Summary statistics are crap on multimodal distributions
● “there is no presently generally agreed summary statistic (or set of
statistics) to quantify the parameters of a general bimodal
distribution”

By restricting or removing certain features
aren’t you sacrificing performance?
Isn’t it actually adding bias if you decide which
features to put in or not?
If the data shows that there is a relationship
between X and Y, isn’t that your ground truth?
Isn’t that sub-optimal?

Bad Features
● Not all features are ok!
○ ‘Time travelling’
■ Rating a movie => watched the movie
■ Went to a Meetup => joined the Meetup

Benign Features
● Not all Features are useful!
○ Member only features don’t affect ranking (in simple models)
○ Clicked an email => likely to join/rsvp/etc.

“It’s difficult to make
predictions, especially about
the future”

Misguided Models
● Offline performance != Online performance
● Predicting past behavior != Influencing behavior
● Clicks vs. buy behavior in ads

“Computers
are useless,
they can only
give you
answers”

Asking the right questions
● Need a human
○ Choosing features
○ Choosing the right target variable
○ Value-added ML

Asking the right questions
● Need a human
○ Auto-ethics
■ Tramer, FairTest
■ Defining un-ethical features
■ Who decides to look for fairness in the first place?

https://research.google.com/bigpicture/attacking-discrimination-in-ml/

Example
● Questionable real-world applications
○ Screen job applications
○ Screen college applications
○ Predict salary
○ Predict recidivism
● Features?
○ Race
○ Gender
○ Age

Correlating features
● Name -> Gender
● Name -> Age
● Grad Year -> Age
● Zip -> Socioeconomic Class
● Zip -> Race
● Likes -> Age, Gender, Race, Sexual Orientation...
● Credit score, SAT score, College prestigiousness...

At your job...
Not everyone will have the same ethical values, but you don’t have to take
‘optimality’ as an argument against doing the right thing.

“All models are wrong,
but some are useful”
Your model is already biased, it will never be
optimal. Don’t turn wisdom into heresy.
@estola

Recommandé

Yuri M. Brovman, Data Scientist, eBayMLconf

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 MLconf

Claudia Perlich, Chief Scientist, Dstillery MLconf

Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017MLconf

Erik Bernhardsson, CTO, Better MortgageMLconf

Malika Cantor, Operations Partner, Comet Labs at The AI Conference 2017MLconf

Tim Chartier, Chief Academic Officer, Tresata at MLconf ATL 2017MLconf

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...MLconf

Recommandé

Yuri M. Brovman, Data Scientist, eBayMLconf

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 MLconf

Claudia Perlich, Chief Scientist, Dstillery MLconf

Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017MLconf

Erik Bernhardsson, CTO, Better MortgageMLconf

Malika Cantor, Operations Partner, Comet Labs at The AI Conference 2017MLconf

Tim Chartier, Chief Academic Officer, Tresata at MLconf ATL 2017MLconf

Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...MLconf

Dr. Bryce Meredig, Chief Science Officer, Citrine at The AI Conference MLconf

Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017MLconf

Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017MLconf

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf

Rahul Mehrotra, Product Manager, Maluuba at The AI Conference 2017MLconf

Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017MLconf

Qiaoling Liu, Lead Data Scientist, CareerBuilder at MLconf ATL 2017MLconf

Artemy Malkov, CEO, Data Monsters at The AI Conference 2017 MLconf

Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georg...MLconf

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...MLconf

Will Murphy, VP of Business Development & Co-Founder, Talla at The AI Confere...MLconf

Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017MLconf

Talha Obaid, Email Security, Symantec at MLconf ATL 2017MLconf

LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...MLconf

Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017MLconf

Jennifer Marsman, Principal Software Development Engineer, Microsoft at MLcon...MLconf

Daniel Shank, Data Scientist, Talla at MLconf SF 2017MLconf

Jonas Schneider, Head of Engineering for Robotics, OpenAIMLconf

Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf

Contenu connexe

En vedette