1. Practicing Data Science Responsibly
Rahul Bhargava
Research Scientist
Civic Media Group - MIT Media Lab
rahulb@media.mit.edu
@rahulbot
2. Deloitte Data Science Forum
June 2, 2016
Rahul Bhargava
MIT Center for Civic Media
rahulb@mit.edu
@rahulbot
3. Deloitte Data Science Forum
June 2, 2016
Rahul Bhargava
MIT Center for Civic Media
rahulb@mit.edu
@rahulbot
Discrimination in Online Ad
Delivery
Latanya Sweeney
Harvard University
Riding with the Stars
Anthony Tockar
Neustar, 2014
InCoding
Joy Buolamwini
MIT Media Lab, 2016
4. Deloitte Data Science Forum
June 2, 2016
Rahul Bhargava
MIT Center for Civic Media
rahulb@mit.edu
@rahulbot
responsible
data creation
responsible
data impacts
responsible
data use
InCoding
Facebook news feed
Facebook emotion study
US NSA SkyNet program
etc.
Google ad results
Tiger Mom Tax – ProPublica
Amazon same-day delivery
etc.
NYC Taxi records
OKCupid profiles
etc.
5. Deloitte Data Science Forum
June 2, 2016
Rahul Bhargava
MIT Center for Civic Media
rahulb@mit.edu
@rahulbot
consumer bill of rights
data breach legislation
protect students
identify discrimination
revise ECPA
and more…
new ethics review standards
data-aware grant making
case studies & curricula
spaces to talk about this
standards for data-sharing
and more…
6. Deloitte Data Science Forum
June 2, 2016
Rahul Bhargava
MIT Center for Civic Media
rahulb@mit.edu
@rahulbot
define & maintain your org’s values
do algorithmic QA
set up internal & external review boards
innovate with others in your field to create norms
Rahul’s recommendations:
7. Deloitte Data Science Forum
June 2, 2016
Rahul Bhargava
MIT Center for Civic Media
rahulb@mit.edu
@rahulbot
Are you being responsible?
Turn to your neighbor and talk
about how you are approaching
this. Do you have strategies for
being responsible in the
creation, impact and use of
your data science work? What is
working for you, and what isn’t?
Notes de l'éditeur
I’m Rahul Bhargava and I work on data literacy at the Center for Civic Media
lots of talk of this in the humanitarian space, but less in the corporate world
borrowing “responsible” framing from my friends at the Engine Room
Probably heard about Facebook newsfeed flare-up
I don’t want to debate whether Facebook is being “responsible” or not, but there is clearly a societal expectation of responsibility
People think algorithms are neutral, but they very much are no
Algorithms are artifacts of the cultural context of their creators and the world in which they operate
This can be risky. Three irresponsible examples
* machine learning algorithms trained on white people (Joy photo) - the “coded gaze"
* amazon and delivery to low-income neighborhoods
* de-anonymized NYC Taxi records and differential privacy
I like to think about this as be responsible in three different ways:
Responsible data creation
Responsible data impacts
Responsible data use
Spend a little time describing each of these
So what do we do about this? Are there best practices or norms?
Little regulation in the US right now
White House wrote up a report with some recommendations
- bill of rights, data breach legislation standard, non-US, student data, stop discrimination, ECPA revision
Council on Big Data, Ethics & Society just released their report and recommendations for policy, pedagogy, and network building, further research
- common rule expand to data science, new approaches to ethics review
“ethics” is a scary word… especially when it comes to regulation
Use our existing corporate values to apply to data work: train your staff on this
Do what Latanya Sweeney and others (Christian Sandvig) are doing – algorithmic reverse engineering
Set up an internal review board that any data-related projects need to approved by
- I was just at a Stanford even with the folks from FB who do this