SlideShare une entreprise Scribd logo
1  sur  13
Making It Happen:
     Making It Happen
     Sustainable Data
   Preservation and Use
            March 19, 2013
              Anita de Waard
VP Research Data Collaborations, Elsevier RDS
         a.dewaard@elsevier.com
“What
    aspects/tools/capabilities/frameworks
          are related to this idea?”
• There are many different research databases– both generic
  (Dryad, Dataverse, …) and specific (NIF, IEDA, PDB, …)
• There are many systems for creating/sharing workflows
  (Taverna, MyExperiment, Vistrails, Workflow4Ever etc)
• There are many e-lab notebooks
  (LabGuru, LabArchives, LaBlog, etc)
• There are scores of
  projects, committees, standards, bodies, grants, initiatives,
   conferences for discussing and connecting all of this
  (KEfED, Pegasus, PROV, RDA, Science
  Gateways, Codata, BRDI, Earthcube, etc. etc)
• You can make a living out of this ;-)! (and many of us do…)
…but this is what scientists do:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to
make sense of this,
and writes a paper.
End of story.
Why save research data?
A. Data Preservation:
  – Preserve record of scientific process, provenance
  – Enable reproducible research
B. Data Use:
  – Use results obtained by others
  – Do better science!
  – Improve interdisciplinary work
C. Sustainable Models:
  – Technology transfer; societal/industrial development
  – Reward scientists for data creation (credit/attribution)
  – Long-term archiving
Where The Data Goes Now:
                                                                       PDB:
                         A small portion of data                      88,3 k
                         (1-2%?) stored in small,      PetDB:
 > 50 My Papers                                         1,5 k                    SedDB:
                              topic-focused
  2 M scientists            data repositories                                     0.6 k
                                                              MiRB:
2 M papers/year                                                25k
                                                                               TAIR:
                                                                               72,1 k
                                                 Some data
                                            (8%?) stored in large,
                                                generic data
            Majority of data                    repositories
            (90%?) is stored
           on local hard drives
                                                      Dryad:              Dataverse:
                                                    7,631 files             0.6 M



                                                                     Datacite:
                                                                      1.5 M
Key Needs:                      DEVELOP SUSTAINABLE MODELS
                                                                       PDB:
                         A small portion of data                      88,3 k
                         (1-2%?) stored in small,      PetDB:
 > 50 My Papers                                         1,5 k                    SedDB:
                              topic-focused
  2 M scientists            data repositories                                     0.6 k
                                                              MiRB:
2 M papers/year                                                25k
                                                                               TAIR:
                                                                               72,1 k
                                                 Some data
                                            (8%?) stored in large,
                                                generic data
            Majority of data                    repositories
            (90%?) is stored
           on local hard drives
                                                      Dryad:              Dataverse:
                                                    7,631 files             0.6 M

                    INCREASE DATA
                    PRESERVATION                                     Datacite:
                                                                      1.5 M
Objections (and rebuttals) to data sharing:
 Objection:                        Rebuttal:
 “Our lab notebooks are all on     Graft tools closely on scientists’
 paper – it’s how we do things”    daily practice
 “I need to see a direct benefit   Create tools to allow better
 of any effort I put in.”          insight in own and other’s results.
 “I don’t really trust anyone      Create social networking context
 else’s data – and don’t think     and allow data owner to provide
 they’ll trust mine”               granular access control.
 “I am afraid other people         => Reward system moves
 might scoop my                    from a competition to a
 discoveries”                      ‘shared mission’
From insular ‘CoSI-Factories’…



          Prepare                                    Prepare



Observe             Ponder                  Ponder             Observe
                       Communicate   Communicate

          Analyze                                    Analyze
…to shared experimental repositories:
Across labs, experiments:
track reagents and how
they are used
                                                            Observations

                                                   Observations

                                                             Observations

                Prepare



                                         Prepare
                 Analyze   Communicate



                                          Analyze     Communicate
…to shared experimental repositories:

Compare outcome of
interactions with these
entities
                                                            Observations

                                                   Observations

                                                             Observations

                Prepare



                                         Prepare
                 Analyze   Communicate



                                          Analyze     Communicate
…to shared experimental repositories:
Build a ‘virtual reagent
spectrogram’ by comparing
how different entities
                                                           Observations
interacted in different
experiments                             Think
                                                  Observations

                                                            Observations

               Prepare


                                                         Prepare
                Analyze   Communicate


                                            Communicate Analyze
Some examples:
• Grafting tools on workflow: create tailored
  metadata collection tools on mini-tablets
  in labs to replace paper notebook
• Direct rewards: through ‘PI-Dashboard’:
  allow immediate access/analysis of shared
  data: new science!
• Data sharing rewards: Data Rescue Challenge::
  collect and reward stories/practices of data
  preservation/use in Earth/Lunar Science
• Improve data use: With NIF/Eagle-I: add
  antibodies as key ‘entities’ to paper, link to AB repository


                                                             consortium
How do we make data use happen:
• We are creating repositories of shared experiments:
  you are part of a greater whole!
• Collect and share stories and practices re. data use
  and sustainable systems: “What gets to them?”
• Develop system of rewards for data sharing: enable
  demonstrably better science!
• Work with grant agencies, repositories
  (generic/specific, institutional, cross-national) to
  integrate and annotate existing datasets and enable
  cross-use
• Collectively pioneer long-term funding options;
  support/develop ‘shared mission’ funding challenges

Contenu connexe

Plus de Anita de Waard

Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 

Plus de Anita de Waard (20)

Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 

Making Data Sharing Happen

  • 1. Making It Happen: Making It Happen Sustainable Data Preservation and Use March 19, 2013 Anita de Waard VP Research Data Collaborations, Elsevier RDS a.dewaard@elsevier.com
  • 2. “What aspects/tools/capabilities/frameworks are related to this idea?” • There are many different research databases– both generic (Dryad, Dataverse, …) and specific (NIF, IEDA, PDB, …) • There are many systems for creating/sharing workflows (Taverna, MyExperiment, Vistrails, Workflow4Ever etc) • There are many e-lab notebooks (LabGuru, LabArchives, LaBlog, etc) • There are scores of projects, committees, standards, bodies, grants, initiatives, conferences for discussing and connecting all of this (KEfED, Pegasus, PROV, RDA, Science Gateways, Codata, BRDI, Earthcube, etc. etc) • You can make a living out of this ;-)! (and many of us do…)
  • 3. …but this is what scientists do: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of this, and writes a paper. End of story.
  • 4. Why save research data? A. Data Preservation: – Preserve record of scientific process, provenance – Enable reproducible research B. Data Use: – Use results obtained by others – Do better science! – Improve interdisciplinary work C. Sustainable Models: – Technology transfer; societal/industrial development – Reward scientists for data creation (credit/attribution) – Long-term archiving
  • 5. Where The Data Goes Now: PDB: A small portion of data 88,3 k (1-2%?) stored in small, PetDB: > 50 My Papers 1,5 k SedDB: topic-focused 2 M scientists data repositories 0.6 k MiRB: 2 M papers/year 25k TAIR: 72,1 k Some data (8%?) stored in large, generic data Majority of data repositories (90%?) is stored on local hard drives Dryad: Dataverse: 7,631 files 0.6 M Datacite: 1.5 M
  • 6. Key Needs: DEVELOP SUSTAINABLE MODELS PDB: A small portion of data 88,3 k (1-2%?) stored in small, PetDB: > 50 My Papers 1,5 k SedDB: topic-focused 2 M scientists data repositories 0.6 k MiRB: 2 M papers/year 25k TAIR: 72,1 k Some data (8%?) stored in large, generic data Majority of data repositories (90%?) is stored on local hard drives Dryad: Dataverse: 7,631 files 0.6 M INCREASE DATA PRESERVATION Datacite: 1.5 M
  • 7. Objections (and rebuttals) to data sharing: Objection: Rebuttal: “Our lab notebooks are all on Graft tools closely on scientists’ paper – it’s how we do things” daily practice “I need to see a direct benefit Create tools to allow better of any effort I put in.” insight in own and other’s results. “I don’t really trust anyone Create social networking context else’s data – and don’t think and allow data owner to provide they’ll trust mine” granular access control. “I am afraid other people => Reward system moves might scoop my from a competition to a discoveries” ‘shared mission’
  • 8. From insular ‘CoSI-Factories’… Prepare Prepare Observe Ponder Ponder Observe Communicate Communicate Analyze Analyze
  • 9. …to shared experimental repositories: Across labs, experiments: track reagents and how they are used Observations Observations Observations Prepare Prepare Analyze Communicate Analyze Communicate
  • 10. …to shared experimental repositories: Compare outcome of interactions with these entities Observations Observations Observations Prepare Prepare Analyze Communicate Analyze Communicate
  • 11. …to shared experimental repositories: Build a ‘virtual reagent spectrogram’ by comparing how different entities Observations interacted in different experiments Think Observations Observations Prepare Prepare Analyze Communicate Communicate Analyze
  • 12. Some examples: • Grafting tools on workflow: create tailored metadata collection tools on mini-tablets in labs to replace paper notebook • Direct rewards: through ‘PI-Dashboard’: allow immediate access/analysis of shared data: new science! • Data sharing rewards: Data Rescue Challenge:: collect and reward stories/practices of data preservation/use in Earth/Lunar Science • Improve data use: With NIF/Eagle-I: add antibodies as key ‘entities’ to paper, link to AB repository consortium
  • 13. How do we make data use happen: • We are creating repositories of shared experiments: you are part of a greater whole! • Collect and share stories and practices re. data use and sustainable systems: “What gets to them?” • Develop system of rewards for data sharing: enable demonstrably better science! • Work with grant agencies, repositories (generic/specific, institutional, cross-national) to integrate and annotate existing datasets and enable cross-use • Collectively pioneer long-term funding options; support/develop ‘shared mission’ funding challenges