The document discusses new challenges for ethicists and practitioners regarding research ethics principles like privacy, consent, and harm in the context of new technologies and large datasets. It notes that emerging technologies often lead to conceptual gaps in how we think about ethics and policy vacuums for addressing issues. For ethicists, there are new gaps around fundamental research ethics principles. For practitioners like data librarians, there are challenges in obtaining, storing, and sharing datasets in line with ethical standards. Addressing these issues will require reexamining assumptions about privacy, harm, consent and developing new policies.
Global Lehigh Strategic Initiatives (without descriptions)
Ethics in Library Research Data Services: Conceptual Gaps & Policy Vacuums
1. Michael Zimmer, PhD
Assistant Professor, School of Information Studies
Director, Center for Information Policy Research
University of Wisconsin-Milwaukee
www.MichaelZimmer.org
2. Emergence of new technologies and technological environments
often lead to CONCEPTUAL GAPS in how we think about ethical
problems, and POLICY VACUUMS on how we can address them
Computer technology transforms “many of our human activities and
social institutions,” and will “leave us with policy and conceptual
vacuums about how to use computer technology”
“Often, either no policies for conduct in these situations exist or
existing policies seem inadequate”
Jim Moor (1985). “What is Computer Ethics?”
Michael Zimmer | ALISE IE Webinar | August 30, 2016 2
3. As ETHICISTS, we’re faced with new conceptual gaps in how we
think about some of the most fundamental principles of
research ethics, like privacy, consent, and harm
As PRACTIONERS, we’re faced with new policy vacuums about
how we are to help researchers obtain, store, and share
datasets
Michael Zimmer | ALISE IE Webinar | August 30, 2016 3
5. In 2006, AOL released over 20 million search queries from
658,000 of its users to the public in an attempt to support
academic research on search engine usage
Despite AOL’s attempts to anonymize the data, individual users
remained identifiable based solely on their search histories
which included search terms matching users’ names, social security
numbers, addresses, phone numbers, and other personally
identifiable information.
Upon being identified by The New York Times based solely on
her search terms in the AOL database, a Georgia woman
exclaimed, “My goodness it’s my whole personal life…I had no
idea somebody was looking over my shoulder”
Michael Zimmer | ALISE IE Webinar | August 30, 2016 5
6. Harvard-based “Tastes, Ties, and Time” (T3) research project
sought to understand social network dynamics of large groups of
students
Worked with Facebook & an “anonymous” university to harvest
the Facebook profiles of an entire cohort of college freshmen
Repeated each year for their 4-year tenure
NSF mandated release of data, first wave in Sept 2008
“All the data is cleaned so you can’t
connect anyone to an identity”
Michael Zimmer | ALISE IE Webinar | August 30, 2016 6
7. But dataset had unique cases and codes, making identifying the
“anonymous” university trivial
Took me minimal effort to discern the source was Harvard,
and thus the anonymity (and privacy) of subjects in the study is
jeopardized
Michael Zimmer | ALISE IE Webinar | August 30, 2016 7
8. Deal announced in 2010 that U.S. Library of Congress will archive
all public tweets
At the time of the announcement, this meant 50 million new tweets per
day, with a historical archive of approximately 170 billion tweets
6 month delay for new Tweets, restricted access to researchers only
Open questions:
Can users opt-out from being in permanent archive?
Can users delete tweets from archive?
Will geolocational and other metadata be included?
What about a public tweet that is re-tweeting a private one?
Did users ever expect their tweets to become permanent part of LOC’s
archives?
6 years later, archive still not available
9. Danish student researcher publicly released a dataset of nearly
70,000 users of the online dating site OkCupid, including
usernames, age, gender, location, what kind of relationship (or
sex) they’re interested in, personality traits, and answers to
thousands of profiling questions used by the site
Michael Zimmer | ALISE IE Webinar | August 30, 2016 9
11. As ETHICISTS, we’re faced with new conceptual gaps in how we
think about some of the most fundamental principles of
research ethics, like privacy, consent, and harm
As PRACTIONERS, we’re faced with new policy vacuums about
how we are to help researchers obtain, store, and share
datasets
Michael Zimmer | ALISE IE Webinar | August 30, 2016 11
12. Presumption that because subjects make information available
on a OkCupid, Facebook, or Twitter, they don’t have an
expectation of privacy
Researchers/IRBs might assume everything is always public, and was
meant to be
Assumes no harm could come to subjects if data is already “public”
New ethical problems…
Need to track if ToS/architecture have changed, or if users even
understand what is available to researchers
Ignores contextual nature of sharing
Fails to recognize the strict dichotomy of public/private doesn’t
apply in a world of social & big data sets
13. Presumption that because something is shared or available
within a community, the subject is consenting to it being
harvested for research
Assumes users understand Terms of Service that might mention
“research”
Assumes no harm can come from use of data already shared with
friends or other contextually-bound circles
New ethical problems…
Must recognize that a user making something public online comes
with a set of assumptions/expectations about who can access and
under what conditions
Must recognize how research methods might allow un-anticipated
access to “restricted” data
Users might not understand the technical conditions that enable
access to their data, nor the legal complexities of ToS agreements
14. Presumption that “harm” means risk of physical or tangible
impact on subject
Researchers often imply “data is already public, so what harm could
possibly happen”
New ethical problems
Must move beyond the concept of harm as requiring a tangible
consequence
Protecting from harm is more than protecting from hackers, spammers,
identity thieves, etc
Consider dignity/autonomy theories of harm
Must a “wrong” occur for there to be damage to the subject?
Do subjects deserve control over the use of their data streams?
15. As ETHICISTS, we’re faced with new conceptual gaps in how we
think about some of the most fundamental principles of
research ethics, like privacy, consent, and harm
Michael Zimmer | ALISE IE Webinar | August 30, 2016 15
16. Michael Zimmer | ALISE IE Webinar | August 30, 2016 16
“With the appearance of big data,
open data, and particularly research
data curation on many libraries’
radar screens, data service has
become a critically important topic
for academic libraries”
17. As ETHICISTS, we’re faced with new conceptual gaps in how we
think about some of the most fundamental principles of
research ethics, like privacy, consent, and harm
As PRACTIONERS, we’re faced with new policy vacuums about
how we are to help researchers obtain, store, and share
datasets
Michael Zimmer | ALISE IE Webinar | August 30, 2016 17
18. Data librarians might be tasked with assisting in obtaining data
sets for big data research
Searching repositories for existing data sets shared by others
Assisting with tools that scrape and collect data online
New challenges:
How do you confirm the provenance of data collected by others?
Should a data librarian help locate controversial datasets?
Can you use data that was later pulled from public accessibility?
Data that was hacked or stolen?
Can we ensure research subjects when scraping data with ad hoc
scraping tools?
Should a data librarian require proof of IRB approval before assisting?
Michael Zimmer | ALISE IE Webinar | August 30, 2016 18
19. Data librarians commonly asked to help with storing and
archiving research data
Maintain institutional data repository
Assist with drafting data management plans
New challenges:
Should library policy require de-identification of data prior to
storing?
What kind of security must be in place? Simple access controls, or
full data encryption?
Should any data be destroyed, rather than stored? How soon, and in
what way?
Michael Zimmer | ALISE IE Webinar | August 30, 2016 19
20. Data librarians might act as gatekeepers for making institutional
data sets available to others
New challenges:
What kind of access policies should be in place? What kind of
limitations might be reasonable?
Should data be “scrubbed” or de-identified prior to making it
publically available?
How do we ensure secondary use is aligned with justification for the
initial collection of data?
Michael Zimmer | ALISE IE Webinar | August 30, 2016 20
21. As ETHICISTS, we’re faced with new conceptual gaps in how we
think about some of the most fundamental principles of
research ethics, like privacy, consent, and harm
As PRACTIONERS, we’re faced with new policy vacuums about
how we are to help researchers obtain, store, and share
datasets
Michael Zimmer | ALISE IE Webinar | August 30, 2016 21
22. Buchanan, E. and C. Ess (2009). Internet Research Ethics and the Institutional
Review Board: Current Practices and Issues. Computers and Society, 39(3): 43–
49.
Buchanan, E. and M. Zimmer (2016, Fall) "Internet Research Ethics", The
Stanford Encyclopedia of Philosophy
http://plato.stanford.edu/archives/fall2016/entries/ethics-internet-research/
Markham, A., and Buchanan, E. (2012). Ethical Decision-Making and Internet
Research: Recommendations from the AoIR Ethics Working Committee
(Version 2.0). Association of Internet Researchers.
http://aoir.org/reports/ethics2.pdf
Secretary’s Advisory Committee to the Office for Human Research Protections
(SACHRP), “Considerations and Recommendations Concerning Internet
Research and Human Subjects Research Regulations, with Revisions”
http://www.hhs.gov/ohrp/sites/default/files/ohrp/sachrp/mtgings/2013%20Mar
ch%20Mtg/internet_research.pdf
Zimmer, M. (2010). “But the data is already public”: On the ethics of research
in Facebook. Ethics and Information Technology, 12(4), 313–325.
Zimmer, M, and K. Kinder-Kurlanda (eds.) (forthcoming). Internet Research
Ethics for the Social Age: New Challenges, Cases, and Contexts. New York:
Peter Lang Publishing
Michael Zimmer | ALISE IE Webinar | August 30, 2016 22
23. Michael Zimmer, PhD
Assistant Professor, School of Information Studies
Director, Center for Information Policy Research
University of Wisconsin-Milwaukee
www.MichaelZimmer.org
Notes de l'éditeur
Researchers are hoping to advance our understanding of a phenomenon by making publicly available large datasets of user information they considered already in the public domain