SlideShare une entreprise Scribd logo
1  sur  27
What have Scientists Planned for Data
        Sharing and Reuse?
  A Content Analysis of NSF Awardees’ Data
             Management Plans


        Renata Curty, Youngseek Kim & Dr. Jian Qin




                    Baltimore, 4-5 April 2013
Motivation

While the NSF mandate gives researchers
plenty flexibility to define their own DMP
and many academic institutions provide
DMP writing support, little is known about
how scientists address their strategies on
their DMPs.
Study Design
 Online Survey: 20 questions

 Target Population: NSF Awardees from January 18, 2011 to
  November 5, 2012 - Standard Grants - Total 16065

 Random Sample: 1606 cases
 Pilot Study: 100 Awardees (Survey Reformulation)

 Final Deployment: 966 awardees, 169 responses (17.5%) and
  DMPs (68)
Awards Info
Amount Awarded   NSF Directorate

                             13%     10%


                                            16%
                     15%


                                            12%
                       16%

                                    18%

                    BIO      CISE     EHR     ENG
                    GEO      MPS      SBE

 166                                              166
Awardees Info
Age                      Organization Type


 65+    7%

55-64        19%

45-54              26%

35-44                    41%

25-24   7%

                                       Academia, 93%


  150                                                  151
Awardees Info
Position in Academia                                         Tenured
                                                               62%
Researcher
  6.77%
                             Assistant                                      On
                             Professor                                    Tenure
                               22%                                         Track Non-
                                                                           25% Tenure
             Full
          Professor                                                              Track
                             Associate                                           11%
            40%              Professor                                                   Retired
                               28%                                                         2%



Others: Dean (3), Professor Emeritus (1), Professor of Practice
(1), Lecturer/Instructor (1), Post-Doctoral Fellow (1), Emeritus Senior
Scientist, Director, Expert Consultant, Administrative Faculty
Position, Chair.
   143                                                                                             138
Geographical Distribution




Created with Google Fusion Tables.   109
DMP is important to formalize data sharing practices in science
                                                                                                                                                       N=166

                                       10.84%
                10.24%




                                                                  22.89%




                                                                                              33.13%




                                                                                                                                   13.25%
    3.01%



                              6.63%
                                                                                                                                                        = 4.93
                                                                                                                                                         = 1.62


            Writing a DMP for NSF proposal is a challenging task
                                                                                                                                                       N=167
                     21.56%




                                                13.77%




                                                                            25.75%




                                                                                                   23.35%




                                                                                                                          10.18%
0.40%




                                                                                                                                            2.99%
                                                                                                                                                        = 3.89
                                                                                                                                                         = 1.45


            DMP is difficult to execute
                                                                                                                                                       N=167
                              22.75%




                                                         11.38%




                                                                                     25.75%




                                                                                                            23.35%
        4.79%




                                                                                                                              8.98%
                                                                                                                                               2.99%
                                                                                                                                                        = 3.79
                                                                                                                                                         = 1.51

                     Strongly disagree                                     Disagree                                  Somewhat disagree
                     Neither agree or disagree                             Somewhat agree                            Agree
                     Strongly agree
Types of Data                                  Documentation of Data
 3D Models                       13.01% - 19
 Audio Files                     12.33% - 18   Will follow:
 Curriculum Materials            21.23% - 31
 Data Models                     27.40% - 40   46% - Disciplinary
 Field Notes                     26.03% - 38    practices
 Experimental Data               63.70% - 93
 Images                          36.99% - 54
                                               37% - Research project’s
 Interview Transcripts           17.12% - 25
 Patient Records                   0.68% - 1
                                                needs
 Samples                         20.55% - 30
 Software                        35.62% - 52   17% - Institutional
 Spreadsheets                    40.41% - 59    recommendations/
 Video Files                     21.23% - 31    guidelines
 Others: Computational Models, Surveys, DNA
 Sequences, Computer Codes, Crowdsourcing
 Data (Reviews)                                                        158
Challenges Encountered
                                                           Others:
                     Which
                   stage(s) of        None
                                      26%                   Some projects do not generate
                  research to
                                                             data
                    share the
                      data                     Lack of
      Data                                    guidance      Conflict between DMP
                        25%
  Description &                               from my        requirement and IRB
  Documentation                              institution     requirements regarding social
      30%                                                    and behavioral research data
                                                29%
    Level of                                                Conflicts intellectual property
   granularity                                Lack of       and data protection
     of data                                 guidance
      25%             Appropriate            from NSF       Long-term preservation issues
                     infrastructure             36%
                       to archive/                          Conflicts individual/group vs.
                     preserve data                           institutional strategies
                          41%
                                                                                          169
Data Access & Availability

            Restricted
               5%
                                By email request                45.52% - 61

                                Personal website                17.91% - 24
                         Open
                         45%    Research Group/Project
                                                                51.49% - 69
                                Website
     Available with             Institutional Repository        20.15% - 27
         some
      restrictions              Disciplinary Repository         32.84% - 44
          51%
                                Others: “Publications”, “Available to NSF only”




                                                                          167
Barriers for Data Reuse




 164
Reuse Issues - Privacy, Anonymity & Confidentiality
 “IRB restrictions on ability to share even deidentified data. Concern that sharing
 even deidentified data will discourage participation in the study.”

 “For myself, no. But for others to use my data, yes: for qualitative data, under IRB
 requirements for the protection of human subjects around confidentiality and
 anonymity, DMPs are nearly impossible to implement without perhaps some
 kind of temporal restriction on them (like, ‘This archive can only be opened in 20 -
 30 - 40 years’ or something like that)”

 “The project involves human subject; so protections have to be put in place that
 may limit reuse applications in the future.”

 “HIPAA *Health Insurance Portability and Accountability Act+ issues - obtaining
 self reporting data on human subjects.”
Reuse Issues - Context, Time Factor & Documentation
“My past data was collected on a unique system built specifically for the research project.
Need lots of context to reuse the data.”

“The only problems I see is that data can be taken out of context in a way that produces
results that might not be correct.”

“Data is specific to testing scenarios. The insight gleaned from our experimental data is of
more importance than the data itself.”

“My data is for specific purposes and it is hard to conceive of how someone would use it for
something else/different. Even with a significant amount of metadata it would be difficult for
someone to know all the circumstances under which the data was collected and why it was
collected.”

“All scientific data is collected in particular context. Mechanisms that facilitate the description
of that context are lacking. The creation of metadata that provides this information is a
cumbersome, boring task and there are few resources available to ease the burden.”
Reuse Issues - Format, Tools, Infrastructure
 Interoperability & Standards
“Systems are always changing...It would be best if we could upload data to NSF so
that it will be publicly available in the same way NIST [National Institutes of
Standards and Technology+ publishes data.”

“Our raw data formats are extremely large, and need to be compressed into
reduced, on-line archives for sharing. It is not possible for me as an individual PI to
archive the raw data for others to examine.”

“My data is generally related to large software artifacts, so using it could involve
quite a bit of work to get those artifacts running. This is something that I explicitly
try to come up with solutions for in my DMPs.”

“Until NSF provides a free national repository for data archiving, we will not make
progress in this area. If such an archive was available, it would be sensible to
require researchers to place data there at the end of a grant and would allow other
researchers to take advantage of it in a practical way.”
DMPs – Preliminary Content Analysis
 • Coding Scheme
    Used both deductive and inductive approaches
    35 codes
       NSF DMP Policy and University of Virginia's Guideline
       Emerged from DMP statements


 • Data Analysis Procedure
    A total of 766 utterances were identified
    642 unique utterances
DMPs’ Content




    <Wordle Cloud Generated Based on Numbers of Each Code across the 68 DMPs>
Coding Scheme

                                      Data Access               Data
  Types of            Metadata                                                  Data Reuse
                                       & Sharing              Archiving                               Others
    Data              Standards                                                    Plan
                                        Process                 Plan
                                                                                • Strategy for
                                                                                  Archiving Data
                                                                                • Which
                                      •   When Available
                                                                                  Repository
• What to         • Data Format       •    How Available     • Reusability of                      • Data Lifecycle
                                                                                • Procedures for
  Generate        • Metadata Form     •    What Available      the Data                            • Data Curation
                                                                                  Long-Term
• What Data       • How to Create     •    Process for       • Restrictions                        • Budget
                                                                                  Storage
  Types           •                       Gaining Access       to Access
                    Which                                                       • Data
• How to Create                       •    How Long          • Groups
                    Metadata                                                      Preservation
• Where to Get                            Retain the Right     Interested In
                    Standard                                                      Period
  Existing Data                       •    Embargo Period    • Foreseeable
                  • Contextual                                                  • What Data
                                      •    Ethical/Privacy     Uses/Users
                                                                                  Preserved for
                    Details Needed        Issues
                                                                                  Long-Term
                  • Discoverability   •    Compliance
                                                                                • Transformation
                    of the Data           with IRB
                                                                                  Required
                                          Protocol
                                                                                • Data
                                      •    Whose
                                                                                  Documentation
                                          Intellectual
                                                                                • Related
                                          Property
                                                                                  Information
Types of Data
 Codes                             Freq.                              Examples
 What to Generate                   58     Geochemical Data, Physical Samples, Mathematica
                                           (programing) Code, Course Materials
 What Data Types                    37     Gene Sequences, Experimental Data, Interview Transcript,
                                           Video Recordings
 How to Create Data                 25      Experimental Setup, Field Observation, Simulation, Survey,
                                           Interviews
 Where to Get Existing Data         13      Moore Laboratory of Zoology, ArcView/GIS Inventories,
                                           Prior Study’s Database
Metadata Standard
 Codes                        Freq.                                Examples
 Data Format                   38     CSV file, TEMPO data file, XML format, SPSS file, plain text
 Metadata Form                 31     ArcGIS Metadata file, XML-base standard file, GIS database file
 How to Create Metadata        14      Use existing metadata standards, or develop their own metadata
                                      standards
 Which Metadata Standard       15      Dublin Core, DNA Sequence Metadata, EML (Ecological Metadata
                                      Language)
 Contextual Details Needed     10      All aspect of the development project documented, experimental
                                      procedure record
 Data Discoverability          7      Searches Built into Library, Searchable through Project Website
Data Access & Sharing Process
Codes                          Freq.                          Examples
When Available                  28      Post-Publication, Post-Project, After Data Collection
                                37      Upon Request, Project Website, GMOD CHADO
How Available
                                       databases, Institutional Repository
                                33     Original research data (genome assemblies), survey
What Available
                                       data, educational materials
                                25      Email Request, Material Transfer Agreement, Direct
Process for Gaining Access
                                       Access from Web or Repository
                                18      Withhold until Publication, Years after Project Ends,
How Long Retain the Right
                                       Years after Data Production
                                5       Years after data collection, Period for
Embargo Period
                                       commercialization
Ethical/Privacy Issues          21     Privacy information is not available for public
Compliance with IRB Protocol    13     IRB application submission for human subject research
                                17     Property of the PI and Co-PIs, Institutions, Open-
Whose Intellectual Property
                                       Access
Data Archiving
 Codes                           Freq.                         Examples
                                  31     Hosted on the Web Servers at (university), ICPSR,
 Strategy for Archiving Data
                                         disciplinary data repository
                                  55     Organization website, institutional or discipline
 Which Repository
                                         data repository
 Procedures for Long-Term         33     Submitted to databanks including NCBI GEO,
 Storage                                 Genbank, DataONE, Dryad
                                  11     Minimum of five years post-grant funding, Long-
 Data Preservation Period                term preservation through disciplinary data
                                         repositories
 What Data Preserved for Long-    7      All data and materials generated by this award,
 Term                                    Genome Sequencing Data
                                  4      Keeping raw image data in its uncompressed form,
 Transformation Required
                                         transferred to IRI format
                                  11     Contextual details about experimental procedures,
 Data Documentation Submitted
                                         all aspects of the development project
                                  3      Metadata files, proposed study information,
 Related Information Submitted
                                         companion web page
Data Reuse Plan
 Codes                     Freq.                            Examples
                             6     Descriptions about reusable methods (Used by a research
 Reusability of the Data
                                   community to follow-up)
 Restrictions to Access      6     Access allowed for a certain group of researchers
                                   Wider research community studying the Great Lakes,
 Groups Interested In        8     academic geography organizations, and geography
                                   teacher associations
                                   Available to engineers, clinicians, and medical
 Foreseeable Uses/Users     10     researchers, sociologists and psychologists working in
                                   relevant sub-fields.

Others
Codes              Freq.                                Examples
Data Lifecycle       1     Application of the Life Cycle Inventory databases
 Data Curation       4     Curation (Consortiums and Partnerships)
 Budget              9     Institution will absorb costs, no incremental costs , marginal costs
Data Available -

30
                                                                      27
25

20

15                                                                              13
                              10
10                                                 8

 5       3          3                   3
                                                             1
 0
     After data    After    After Years after Years after Years after Not       Not
     collection   project publication data      project publication Specified Mentioned
                   ends              collection  ends
Types of Data Repositories for Long-Term Archiving
16
                                 14
14                                                                   13          13
12       11                                  11
10
8
6
                     4
4
                                                         2
2
0
     Disciplinary External/ Institutional Internal/    Journal      Lab/         Not
      Repository Commercial Repository Institutional Repository/ Organization mentioned/
                   Storage                 Storage Supplement Website          Specified
Some insights – DMPs’ Preliminary Analysis
 More informal/personal data sharing procedures rather than
  formal/institutionalized data sharing and management plans

 Most DMPs lacks content on “Metadata Standard” and “Data
  Reuse Plan”

 Few have plans for long-term archiving. Very vague plans and
  ideas about long-term use of their data

 Many DMPs addressed data archiving in institutional repositories
  that are not in existence yet, but expected to be created

 A few DMPs mentioned interview transcripts will be available, but
  without addressing IRB issues
Future Directions

    Survey a larger number of Awardees

    More exhaustive coding analysis and in-depth
     exploration of the DMPs’ content

    Analysis of DMPs to identify patterns, common
     challenges and best practices across and within
     different disciplinary communities
Thank you!
 rcurty@syr.edu




Let’s Go Orange!

Contenu connexe

Similaire à RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Middle School Success
Middle School SuccessMiddle School Success
Middle School SuccessChristin007
 
Wickes summary
Wickes summaryWickes summary
Wickes summaryJCDecauxUK
 
The use of ICT by South African physiotherapy students
The use of ICT by South African physiotherapy studentsThe use of ICT by South African physiotherapy students
The use of ICT by South African physiotherapy studentsMichael Rowe
 
Findings from 2011 CASE/mStoner/Slover Linett Survey of Social Media & Advanc...
Findings from 2011 CASE/mStoner/Slover Linett Survey of Social Media & Advanc...Findings from 2011 CASE/mStoner/Slover Linett Survey of Social Media & Advanc...
Findings from 2011 CASE/mStoner/Slover Linett Survey of Social Media & Advanc...Michael Stoner
 
EOPS-Celebrating 40 Years Of Service
EOPS-Celebrating 40 Years Of ServiceEOPS-Celebrating 40 Years Of Service
EOPS-Celebrating 40 Years Of ServiceNazaryan
 
Igtm pre event golf survey by sports marketing survey - part I
Igtm pre event golf survey by sports marketing survey - part IIgtm pre event golf survey by sports marketing survey - part I
Igtm pre event golf survey by sports marketing survey - part IMS-Co
 
Mobile and social media - Brent Leary - Atlanta tour
Mobile and social media - Brent Leary - Atlanta tourMobile and social media - Brent Leary - Atlanta tour
Mobile and social media - Brent Leary - Atlanta tourRamon Ray
 
The CMO Club Survey Charts
The CMO Club Survey ChartsThe CMO Club Survey Charts
The CMO Club Survey Chartsacohenhnk
 
Quality factors influencing online education
Quality factors influencing online educationQuality factors influencing online education
Quality factors influencing online educationmichael14a
 
What doctors think about NICE
What doctors think about NICEWhat doctors think about NICE
What doctors think about NICEDoctors.net.uk
 

Similaire à RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C… (12)

Expectations and Perceptions of Librarians Towards Continuing Professional Ed...
Expectations and Perceptions of Librarians Towards Continuing Professional Ed...Expectations and Perceptions of Librarians Towards Continuing Professional Ed...
Expectations and Perceptions of Librarians Towards Continuing Professional Ed...
 
Middle School Success
Middle School SuccessMiddle School Success
Middle School Success
 
Wickes summary
Wickes summaryWickes summary
Wickes summary
 
The use of ICT by South African physiotherapy students
The use of ICT by South African physiotherapy studentsThe use of ICT by South African physiotherapy students
The use of ICT by South African physiotherapy students
 
Findings from 2011 CASE/mStoner/Slover Linett Survey of Social Media & Advanc...
Findings from 2011 CASE/mStoner/Slover Linett Survey of Social Media & Advanc...Findings from 2011 CASE/mStoner/Slover Linett Survey of Social Media & Advanc...
Findings from 2011 CASE/mStoner/Slover Linett Survey of Social Media & Advanc...
 
EOPS-Celebrating 40 Years Of Service
EOPS-Celebrating 40 Years Of ServiceEOPS-Celebrating 40 Years Of Service
EOPS-Celebrating 40 Years Of Service
 
Igtm pre event golf survey by sports marketing survey - part I
Igtm pre event golf survey by sports marketing survey - part IIgtm pre event golf survey by sports marketing survey - part I
Igtm pre event golf survey by sports marketing survey - part I
 
Mobile and social media - Brent Leary - Atlanta tour
Mobile and social media - Brent Leary - Atlanta tourMobile and social media - Brent Leary - Atlanta tour
Mobile and social media - Brent Leary - Atlanta tour
 
The CMO Club Survey Charts
The CMO Club Survey ChartsThe CMO Club Survey Charts
The CMO Club Survey Charts
 
Sp111009
Sp111009Sp111009
Sp111009
 
Quality factors influencing online education
Quality factors influencing online educationQuality factors influencing online education
Quality factors influencing online education
 
What doctors think about NICE
What doctors think about NICEWhat doctors think about NICE
What doctors think about NICE
 

Plus de ASIS&T

RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)ASIS&T
 
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...ASIS&T
 
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)ASIS&T
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
 
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeRDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeASIS&T
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...ASIS&T
 
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...ASIS&T
 
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...ASIS&T
 
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?ASIS&T
 
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...ASIS&T
 
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge BrokerRDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge BrokerASIS&T
 
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
 
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...ASIS&T
 
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research DataRDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research DataASIS&T
 
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide CollaborationRDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide CollaborationASIS&T
 
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...ASIS&T
 
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...ASIS&T
 
RDAP 16: I built it. They came. Now what? (Panel 2, Sustainability)
RDAP 16: I built it. They came. Now what? (Panel 2, Sustainability)RDAP 16: I built it. They came. Now what? (Panel 2, Sustainability)
RDAP 16: I built it. They came. Now what? (Panel 2, Sustainability)ASIS&T
 
RDAP 16: Building Sustainable Services at the Small(er) Scale (Panel 4, Measu...
RDAP 16: Building Sustainable Services at the Small(er) Scale (Panel 4, Measu...RDAP 16: Building Sustainable Services at the Small(er) Scale (Panel 4, Measu...
RDAP 16: Building Sustainable Services at the Small(er) Scale (Panel 4, Measu...ASIS&T
 
RDAP 16 Poster: Librarian Research Data: Customizing the DMP Assistant for Pr...
RDAP 16 Poster: Librarian Research Data: Customizing the DMP Assistant for Pr...RDAP 16 Poster: Librarian Research Data: Customizing the DMP Assistant for Pr...
RDAP 16 Poster: Librarian Research Data: Customizing the DMP Assistant for Pr...ASIS&T
 

Plus de ASIS&T (20)

RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
 
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
 
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
 
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeRDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in Practice
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
 
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
 
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
 
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
 
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
 
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge BrokerRDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
 
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
 
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
 
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research DataRDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
 
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide CollaborationRDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
 
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
 
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
 
RDAP 16: I built it. They came. Now what? (Panel 2, Sustainability)
RDAP 16: I built it. They came. Now what? (Panel 2, Sustainability)RDAP 16: I built it. They came. Now what? (Panel 2, Sustainability)
RDAP 16: I built it. They came. Now what? (Panel 2, Sustainability)
 
RDAP 16: Building Sustainable Services at the Small(er) Scale (Panel 4, Measu...
RDAP 16: Building Sustainable Services at the Small(er) Scale (Panel 4, Measu...RDAP 16: Building Sustainable Services at the Small(er) Scale (Panel 4, Measu...
RDAP 16: Building Sustainable Services at the Small(er) Scale (Panel 4, Measu...
 
RDAP 16 Poster: Librarian Research Data: Customizing the DMP Assistant for Pr...
RDAP 16 Poster: Librarian Research Data: Customizing the DMP Assistant for Pr...RDAP 16 Poster: Librarian Research Data: Customizing the DMP Assistant for Pr...
RDAP 16 Poster: Librarian Research Data: Customizing the DMP Assistant for Pr...
 

RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

  • 1. What have Scientists Planned for Data Sharing and Reuse? A Content Analysis of NSF Awardees’ Data Management Plans Renata Curty, Youngseek Kim & Dr. Jian Qin Baltimore, 4-5 April 2013
  • 2. Motivation While the NSF mandate gives researchers plenty flexibility to define their own DMP and many academic institutions provide DMP writing support, little is known about how scientists address their strategies on their DMPs.
  • 3. Study Design  Online Survey: 20 questions  Target Population: NSF Awardees from January 18, 2011 to November 5, 2012 - Standard Grants - Total 16065  Random Sample: 1606 cases  Pilot Study: 100 Awardees (Survey Reformulation)  Final Deployment: 966 awardees, 169 responses (17.5%) and DMPs (68)
  • 4. Awards Info Amount Awarded NSF Directorate 13% 10% 16% 15% 12% 16% 18% BIO CISE EHR ENG GEO MPS SBE 166 166
  • 5. Awardees Info Age Organization Type 65+ 7% 55-64 19% 45-54 26% 35-44 41% 25-24 7% Academia, 93% 150 151
  • 6. Awardees Info Position in Academia Tenured 62% Researcher 6.77% Assistant On Professor Tenure 22% Track Non- 25% Tenure Full Professor Track Associate 11% 40% Professor Retired 28% 2% Others: Dean (3), Professor Emeritus (1), Professor of Practice (1), Lecturer/Instructor (1), Post-Doctoral Fellow (1), Emeritus Senior Scientist, Director, Expert Consultant, Administrative Faculty Position, Chair. 143 138
  • 7. Geographical Distribution Created with Google Fusion Tables. 109
  • 8. DMP is important to formalize data sharing practices in science N=166 10.84% 10.24% 22.89% 33.13% 13.25% 3.01% 6.63% = 4.93 = 1.62 Writing a DMP for NSF proposal is a challenging task N=167 21.56% 13.77% 25.75% 23.35% 10.18% 0.40% 2.99% = 3.89 = 1.45 DMP is difficult to execute N=167 22.75% 11.38% 25.75% 23.35% 4.79% 8.98% 2.99% = 3.79 = 1.51 Strongly disagree Disagree Somewhat disagree Neither agree or disagree Somewhat agree Agree Strongly agree
  • 9. Types of Data Documentation of Data 3D Models 13.01% - 19 Audio Files 12.33% - 18 Will follow: Curriculum Materials 21.23% - 31 Data Models 27.40% - 40 46% - Disciplinary Field Notes 26.03% - 38 practices Experimental Data 63.70% - 93 Images 36.99% - 54 37% - Research project’s Interview Transcripts 17.12% - 25 Patient Records 0.68% - 1 needs Samples 20.55% - 30 Software 35.62% - 52 17% - Institutional Spreadsheets 40.41% - 59 recommendations/ Video Files 21.23% - 31 guidelines Others: Computational Models, Surveys, DNA Sequences, Computer Codes, Crowdsourcing Data (Reviews) 158
  • 10. Challenges Encountered Others: Which stage(s) of None 26%  Some projects do not generate research to data share the data Lack of Data guidance  Conflict between DMP 25% Description & from my requirement and IRB Documentation institution requirements regarding social 30% and behavioral research data 29% Level of  Conflicts intellectual property granularity Lack of and data protection of data guidance 25% Appropriate from NSF  Long-term preservation issues infrastructure 36% to archive/  Conflicts individual/group vs. preserve data institutional strategies 41% 169
  • 11. Data Access & Availability Restricted 5% By email request 45.52% - 61 Personal website 17.91% - 24 Open 45% Research Group/Project 51.49% - 69 Website Available with Institutional Repository 20.15% - 27 some restrictions Disciplinary Repository 32.84% - 44 51% Others: “Publications”, “Available to NSF only” 167
  • 12. Barriers for Data Reuse 164
  • 13. Reuse Issues - Privacy, Anonymity & Confidentiality “IRB restrictions on ability to share even deidentified data. Concern that sharing even deidentified data will discourage participation in the study.” “For myself, no. But for others to use my data, yes: for qualitative data, under IRB requirements for the protection of human subjects around confidentiality and anonymity, DMPs are nearly impossible to implement without perhaps some kind of temporal restriction on them (like, ‘This archive can only be opened in 20 - 30 - 40 years’ or something like that)” “The project involves human subject; so protections have to be put in place that may limit reuse applications in the future.” “HIPAA *Health Insurance Portability and Accountability Act+ issues - obtaining self reporting data on human subjects.”
  • 14. Reuse Issues - Context, Time Factor & Documentation “My past data was collected on a unique system built specifically for the research project. Need lots of context to reuse the data.” “The only problems I see is that data can be taken out of context in a way that produces results that might not be correct.” “Data is specific to testing scenarios. The insight gleaned from our experimental data is of more importance than the data itself.” “My data is for specific purposes and it is hard to conceive of how someone would use it for something else/different. Even with a significant amount of metadata it would be difficult for someone to know all the circumstances under which the data was collected and why it was collected.” “All scientific data is collected in particular context. Mechanisms that facilitate the description of that context are lacking. The creation of metadata that provides this information is a cumbersome, boring task and there are few resources available to ease the burden.”
  • 15. Reuse Issues - Format, Tools, Infrastructure Interoperability & Standards “Systems are always changing...It would be best if we could upload data to NSF so that it will be publicly available in the same way NIST [National Institutes of Standards and Technology+ publishes data.” “Our raw data formats are extremely large, and need to be compressed into reduced, on-line archives for sharing. It is not possible for me as an individual PI to archive the raw data for others to examine.” “My data is generally related to large software artifacts, so using it could involve quite a bit of work to get those artifacts running. This is something that I explicitly try to come up with solutions for in my DMPs.” “Until NSF provides a free national repository for data archiving, we will not make progress in this area. If such an archive was available, it would be sensible to require researchers to place data there at the end of a grant and would allow other researchers to take advantage of it in a practical way.”
  • 16. DMPs – Preliminary Content Analysis • Coding Scheme  Used both deductive and inductive approaches  35 codes  NSF DMP Policy and University of Virginia's Guideline  Emerged from DMP statements • Data Analysis Procedure  A total of 766 utterances were identified  642 unique utterances
  • 17. DMPs’ Content <Wordle Cloud Generated Based on Numbers of Each Code across the 68 DMPs>
  • 18. Coding Scheme Data Access Data Types of Metadata Data Reuse & Sharing Archiving Others Data Standards Plan Process Plan • Strategy for Archiving Data • Which • When Available Repository • What to • Data Format • How Available • Reusability of • Data Lifecycle • Procedures for Generate • Metadata Form • What Available the Data • Data Curation Long-Term • What Data • How to Create • Process for • Restrictions • Budget Storage Types • Gaining Access to Access Which • Data • How to Create • How Long • Groups Metadata Preservation • Where to Get Retain the Right Interested In Standard Period Existing Data • Embargo Period • Foreseeable • Contextual • What Data • Ethical/Privacy Uses/Users Preserved for Details Needed Issues Long-Term • Discoverability • Compliance • Transformation of the Data with IRB Required Protocol • Data • Whose Documentation Intellectual • Related Property Information
  • 19. Types of Data Codes Freq. Examples What to Generate 58 Geochemical Data, Physical Samples, Mathematica (programing) Code, Course Materials What Data Types 37 Gene Sequences, Experimental Data, Interview Transcript, Video Recordings How to Create Data 25 Experimental Setup, Field Observation, Simulation, Survey, Interviews Where to Get Existing Data 13 Moore Laboratory of Zoology, ArcView/GIS Inventories, Prior Study’s Database Metadata Standard Codes Freq. Examples Data Format 38 CSV file, TEMPO data file, XML format, SPSS file, plain text Metadata Form 31 ArcGIS Metadata file, XML-base standard file, GIS database file How to Create Metadata 14 Use existing metadata standards, or develop their own metadata standards Which Metadata Standard 15 Dublin Core, DNA Sequence Metadata, EML (Ecological Metadata Language) Contextual Details Needed 10 All aspect of the development project documented, experimental procedure record Data Discoverability 7 Searches Built into Library, Searchable through Project Website
  • 20. Data Access & Sharing Process Codes Freq. Examples When Available 28 Post-Publication, Post-Project, After Data Collection 37 Upon Request, Project Website, GMOD CHADO How Available databases, Institutional Repository 33 Original research data (genome assemblies), survey What Available data, educational materials 25 Email Request, Material Transfer Agreement, Direct Process for Gaining Access Access from Web or Repository 18 Withhold until Publication, Years after Project Ends, How Long Retain the Right Years after Data Production 5 Years after data collection, Period for Embargo Period commercialization Ethical/Privacy Issues 21 Privacy information is not available for public Compliance with IRB Protocol 13 IRB application submission for human subject research 17 Property of the PI and Co-PIs, Institutions, Open- Whose Intellectual Property Access
  • 21. Data Archiving Codes Freq. Examples 31 Hosted on the Web Servers at (university), ICPSR, Strategy for Archiving Data disciplinary data repository 55 Organization website, institutional or discipline Which Repository data repository Procedures for Long-Term 33 Submitted to databanks including NCBI GEO, Storage Genbank, DataONE, Dryad 11 Minimum of five years post-grant funding, Long- Data Preservation Period term preservation through disciplinary data repositories What Data Preserved for Long- 7 All data and materials generated by this award, Term Genome Sequencing Data 4 Keeping raw image data in its uncompressed form, Transformation Required transferred to IRI format 11 Contextual details about experimental procedures, Data Documentation Submitted all aspects of the development project 3 Metadata files, proposed study information, Related Information Submitted companion web page
  • 22. Data Reuse Plan Codes Freq. Examples 6 Descriptions about reusable methods (Used by a research Reusability of the Data community to follow-up) Restrictions to Access 6 Access allowed for a certain group of researchers Wider research community studying the Great Lakes, Groups Interested In 8 academic geography organizations, and geography teacher associations Available to engineers, clinicians, and medical Foreseeable Uses/Users 10 researchers, sociologists and psychologists working in relevant sub-fields. Others Codes Freq. Examples Data Lifecycle 1 Application of the Life Cycle Inventory databases Data Curation 4 Curation (Consortiums and Partnerships) Budget 9 Institution will absorb costs, no incremental costs , marginal costs
  • 23. Data Available - 30 27 25 20 15 13 10 10 8 5 3 3 3 1 0 After data After After Years after Years after Years after Not Not collection project publication data project publication Specified Mentioned ends collection ends
  • 24. Types of Data Repositories for Long-Term Archiving 16 14 14 13 13 12 11 11 10 8 6 4 4 2 2 0 Disciplinary External/ Institutional Internal/ Journal Lab/ Not Repository Commercial Repository Institutional Repository/ Organization mentioned/ Storage Storage Supplement Website Specified
  • 25. Some insights – DMPs’ Preliminary Analysis  More informal/personal data sharing procedures rather than formal/institutionalized data sharing and management plans  Most DMPs lacks content on “Metadata Standard” and “Data Reuse Plan”  Few have plans for long-term archiving. Very vague plans and ideas about long-term use of their data  Many DMPs addressed data archiving in institutional repositories that are not in existence yet, but expected to be created  A few DMPs mentioned interview transcripts will be available, but without addressing IRB issues
  • 26. Future Directions  Survey a larger number of Awardees  More exhaustive coding analysis and in-depth exploration of the DMPs’ content  Analysis of DMPs to identify patterns, common challenges and best practices across and within different disciplinary communities

Notes de l'éditeur

  1. Random sample with 10% of the target population.After Pilot study (based on just 11 responses with no DMPs) we decided to incorporate a few additional questions to the survey, to get a better sense of their experiences on the process of writing and executing their DMPs.Response rate was affected by other factors, such as: wrong/invalid emails, PIs who changed institution , sabbatical or other types of leaves from who we got automatic responses, and in a few cases PIs who past away after receiving the award. Also, some PIs contacted us explaining they wouldn’t participate because, despite NSF’s mandate their research do not produce data or because they did not recall having a DMP.Only 40.24% of participants shared their DMP. Does it tell anything about willingness to share??? In some cases, participants affirmed they did not have it available when they were filling the survey out.
  2. Amount - Almost the half falls on the 300 thousand to a million dollars range Good distribution of respondents across the 7 NSF directorates, slightly larger share for ENGINEERINGBIO – Biological SciencesGEO – GeosciencesCISE – Computer and Information Science and EngineeringEHR – Education and Human ResourcesMPS – Mathematical and Physical Sciences
  3. Respondents fall mostly in the range between 35-44 yearsAnd as expected the majority belongs to an academic institution
  4. From which, most are tenured and full professors
  5. The map shows a geographical distribution of our respondents
  6. On 3 questions in a 7-point likert scale format (strongly disagree to strongly agree) participants were asked about the importance of DMPs to formalize data sharing practices in science. Results show that respondents tend to somehow agree with the importance, but do not see the process of writing the DMP challenging or hard to execute in the future.
  7. When we questioned: do you foresee barriers for the reuse of the data your research is/will be producing?? The word cloud shows the most recurrent topics in respondent’s to comments in cases they responded affirmatively to the question.
  8. Some comments excerpts
  9. Skepticism about enforcement or verification. Disbelieve it will work without an unified platform.Mention that some participants questioned if the DMP’s execution would be ever verified, because the data will be so dispersed, more paperwork for scientists with little effect on the real intention.
  10. This last issue resonated in some other comments across the survey.
  11. Reinforce 68 DMPs.
  12. Reuse is yet very little covered in the DMPs, in some cases very general statements about potential users.
  13. Not specified – time was not provided, but the DMP says the available data will be available. Not mentioned or no reference to when data will be available.
  14. More adhoc procedures.4th item – famous saying “Count the chickens before they hatch”