2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement on Research Transparency and Data Citation (George Alter - ICPSR)
7 Oct 2013•0 j'aime•1,019 vues
Télécharger pour lire hors ligne
Signaler
Technologie
Économie & finance
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Similaire à 2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement on Research Transparency and Data Citation (George Alter - ICPSR)
Similaire à 2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement on Research Transparency and Data Citation (George Alter - ICPSR)(20)
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement on Research Transparency and Data Citation (George Alter - ICPSR)
1. Data Access and Research
Transparency:
a Data Repository View
George Alter
ICPSR
University of Michigan
2. Mission: ICPSR provides leadership and training in data access,
curation, and methods of analysis for a diverse and expanding social
science research community.
• Acquire and archive social science data
• Distribute data to researchers
• Preserve data for future generations
• Provide training in quantitative methods
About the Inter-university Consortium for Political
and Social Research (ICPSR)
3. ICPSR Then and Now
• ICPSR History
– Established in 1962 so that social scientists could share
data
– Started as a partnership among 21 universities
– Data distributed on punched cards and then magnetic
reel to reel tape
4. ICPSR Then and Now
• ICPSR History
– Established in 1962 so that social scientists could share data
– Started as a partnership among 21 universities
– Data distributed on punched cards and then magnetic reel
to reel tape
• ICPSR Today
– More than 700 members
– 390+ U.S. institutions
– 46 national memberships
– 8,000+ data collections
– Direct downloads
– Online analysis
5. Data archiving and dissemination for more
than 20 federal and private agencies
9. “Building Community Engagement in
Data Citation and Open Access to
Data”
• Funded by Alfred P. Sloan Foundation
– Challenge Grants to improve data citation and
access
– Social science journals
– Domain repositories
10. “Building Community Engagement in Data
Citation and Open Access to Data”
• Challenge grants: 4 selected from 26
applications:
– Richard Ball and Norm Medeiros, "Replication of Empirical
Research: A Soup-to-Nuts Protocol for Documenting Data
Management and Analysis," Haverford College
– Thomas Carsey, "Implementing a Data Citation Workflow within
the State Politics and Policy Journal," University of North Carolina
at Chapel Hill
– Lisa Neidert, "OPEN Data Through a Restricted Data Portal," The
University of Michigan
– Jian Qin and Kevin Crowston, "Development and Dissemination
of a Capability Maturity Model for Research Data Management
Training and Performance Assessment," Syracuse University
11. • AERA Education Evaluation and Policy Analysis
• American Economic Journal: Applied Economics
• American Economics Review
• American Educational Research Association
• American Journal of Political Science
• American Journal of Sociology
• American Psychological Association
• American Sociological Review
• American Statistical Association
• Archives of Scientific Psychology
• Demography
• Institute for Quantitative Social Science, Harvard University
• Journal of Politics
• MIT Libraries
• Society for Research on Educational Effectiveness
• State Politics and Policy Quarterly
Data Citation and Research Transparency
Standards For the Social Sciences
June 13-14, 2013
12. • Association of Religion Data Archives
• CIESIN
• Cultural Policy and the Arts National Data Archive
• Data Conservancy
• Data ONE
• Databrary
• Dryad
• Human Relations Area Files
• Linguistic Data Consortium
• National Academies of Science
• National Snow and Ice Data Center
• Odum Institute
• Roper Center
• SEAD
• tDAR Digital Archaeological Record
• UCLA Data Archive
• University of Michigan Transportation Research Institute
• US Virtual Astronomical Observatory
• Worldwide Protein Data Bank
Sustaining Domain Repositories for Digital
Data, June 24-25, 2013
14. What do we know about sharing of social
science data?
15. Source: Pienta, Amy, Myron Gutmann, & Jared Lyle. 2009. “Research Data in The Social Sciences: How Much is Being Shared?” Research
Conference on Research Integrity, Niagara Falls, NY.
Most data are not shared.
16. Data
Archived
(n=111)
Data Shared
Informally
(n=415)
Data Not
Shared
(n=409)
Primary PI Pubs
(median)
6 6 3
Secondary Pubs, No PI
(median)
8 6 3
Pubs with Students
(median)
4 3 1
Total 18 15 7
Median # of Publications by Data Sharing Status
Source: Pienta, Amy M., George Alter, and Jared Lyle. 2010. “The Enduring Value of Social Science Research: The Use and Reuse of
Primary Research Data.” Presented at the BRICK, DIME, STRIKE Workshop, The Organisation, Economics, and Policy of Scientific
Research, Turin, Italy, April 23‐24, 2010 (http://hdl.handle.net/2027.42/78307)
Shared Data Produce More Publications
17. Why don’t researchers share their
data?
The usual suspects:
• I don’t have time.
• My grant doesn’t pay for it.
• It will be used incorrectly.
• Someone might scoop me with my own data.
Our usual replies:
• You will get credit for sharing.
• More research will be done.
• Transparency and replication are good for science.
19. What are the weak points
in this story?
Will Researcher
2 cite the data?Will Researcher 1
deposit the data?
20. Researcher 1 collects
data and publishes an
article.
Publication as Seen by a
Researcher
Researcher 1 is
rewarded.
45. Achieving Data Access and Research
Transparency:
• Enforcement by funding
agencies
• Ethics codes from Professional
Associations
• Author guidelines from
Journals
• Enforcement by journals
46. Why should funding agencies require
data sharing?
• Data re-use is a more efficient use of funds
– Collecting data is expensive
– Data that are shared produce more science
• Funding agencies are the biggest beneficiaries of data
citation.
• Political winds favor open data
47. Reproducibility should be the gold standard that all peer reviewers and editors aim
for when assessing whether a manuscript has supplied sufficient information to
allow others to repeat and build on the experiments. As such, the presumption must
be that, unless there is a strong reason otherwise, data should be fully disclosed and
made publicly available. In line with this principle, data associated with all publicly
funded research should, where possible, be made widely and freely available. The
work of researchers who expend time and effort
adding value to their data, to make it usable by others, should be acknowledged and
encouraged.
House of Commons, Science and Technology Committee - Eighth Report of
Session 201012 Peer review in scientiic publications. Ordered by the House
of Commons to be printed 18 July 2011.
http://www.publications.parliament.uk/pa/cm201012/cmselect/cmsctech/8
56/856.pdf
Transparency and reproducibility
are politically popular
48. The White House has mandated public
access to federally funded data
49. Congress favors open access to data
“The growing lack of scientific integrity and transparency has many causes but one thing is
very clear: without open access to data, there can be neither integrity nor transparency from
the conclusions reached by the scientific community. Furthermore, when there is no reliable
access to data, the progress of science is impeded and leads to inefficiencies in the scientific
discovery process. Important results cannot be verified, and confidence in scientific claims
dwindles.”
Statement of Research Subcommittee Chairman Larry Bucshon (R-Ind.) Hearing
on Scientific Integrity and Transparency, March 5, 2013.
Open data has bi-partisan support!
50. National Institutes of Health,
Data and Informatics Working Group
Draft Report to The Advisory Committee to the Director,
June 15, 2012
Recommendation 1: Promote Data Sharing Through
Central and Federated Catalogues
1a. Establish a Minimal Metadata Framework for Data
Sharing
1b. Create Catalogues and Tools to Facilitate Data
Sharing
1c. Enhance and Incentivize a Data Sharing Policy for
NIH-Funded Data
51. What is motivating Professional
Associations and Journals?
• Concern about legitimacy
– Cases of fraud and misuse of data
54. What is motivating Professional
Associations and Journals?
• Concern about legitimacy
– Cases of fraud and misuse of data
– Failures of replication
– Public attacks on science
55. How can Professional Associations and
Journals respond?
• Professional associations
– Ethics guidelines that emphasize data access and
research transparency
• Journals
– Data citation guidelines
– Data access policies
• Replication data
• Codes and scripts
– Journals worry about
• Cost
• Compliance
• Competition
56. Improving Data Citation in Journals
Data-PASS letter to the American Sociological Association, August 8, 2010
Similar letters sent to American Economics Association, American Education Research
Association, and American Political Science Association.
57. Data
Citation
References for data sets should include a
persistent identifier, such as a Digital Object
Identifier (DOI). Persistent identifiers ensure
future access to unique published digital
objects, such as a text or data set. Persistent
identifiers are assigned to data sets by digital
archives, such as institutional repositories and
partners in the Data Preservation Alliance for the
Social Sciences (Data-PASS).
58. American Political Science Association
“Guide to Professional Ethics”
October 2012
6. Researchers have an ethical obligation to facilitate the evaluation of their
evidence-based knowledge claims through data access, production
transparency, and analytic transparency so that their work can be tested or
replicated.
6.1 Data access: Researchers making evidence-based knowledge claims
should reference the data they used to make those claims. If these are data
they themselves generated or collected, researchers should provide access to
those data or explain why they cannot.
6.2 Production transparency: Researchers providing access to data they
themselves generated or collected, should offer a full account of the
procedures used to collect or generate the data.
6.3 Analytic Transparency: Researchers making evidence-based knowledge
claims should provide a full account of how they draw their analytic
conclusions from the data, i.e., clearly explicate the links connecting data to
conclusions.
American Political Science Association Guide to Professional Ethics, Rights and
Freedoms
59. The American Economic Review: Data Availability Policy
It is the policy of the American Economic Review to publish papers only if the data
used in the analysis are clearly and precisely documented and are readily available to
any researcher for purposes of replication. Authors of accepted papers that contain
empirical work, simulations, or experimental work must provide to the Review, prior
to publication, the data, programs, and other details of the computations sufficient
to permit replication. These will be posted on the AER Web site. The Editor should be
notified at the time of submission if the data used in a paper are proprietary or if, for
some other reason, the requirements above cannot be met.
As soon as possible after acceptance, authors are expected to send their
data, programs, and sufficient details to permit replication, in electronic form, to the
AER office.
…
If a request for an exemption based on proprietary data is made, authors should
inform the editors if the data can be accessed or obtained in some other way by
independent researchers for purposes of replication. Authors are also asked to
provide information on how the proprietary data can be obtained by others in their
Readme PDF file. A copy of the programs used to create the final results is still
required.
60. Concluding thoughts
• Changing researcher behavior is difficult
• The rewards of data citation are not enough
• Funding agencies and Journals
– have the greatest leverage for changing behavior
– are sympathetic to data access and transparency
61. What can we do?
Funding agencies
• Fund data stewardship
– Researchers should not be faced with a tradeoff
between their scientific aims and data stewardship
• Enforce data management plans
• Improve funding of data repositories
– Recognize data repositories as scientific infrastructure
– Develop relevant evaluation criteria
62. What can we do?
Journals
• Guidelines to authors should include
– Data access policies
– Data citation policies
– Persistent identifiers for data
– Examples
• Keep it simple
– Focus on key elements: Author, Title, Date,
Location (i.e. persistent identifier)
63. What can we do?
Data Archiving Community
• See the whole picture
• Train researchers in data management
– See Ball and Medeiros, “Teaching Students to
Document Empirical Research” on YouTube
• Reduce the costs of capturing metadata in
scientific workflows
• Rate journals on their policies and
performance