SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
1
Research Data Management
Open Science
Daniel Jacob
INRA UMR 1332 BFP – Metabolism Group
Bordeaux Metabolomics Facility
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 2
• Links between Research Data and Open Science
• How the management and preservation of Research Data
can facilitate the work of researchers
• How to address concerns about Data Sharing
• The research Data life cycle
At the end of the course you should understand...
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 3
The Reproducibility Crisis
In recent years, evidence has emerged from disciplines ranging from biology to
economics that many scientific studies are not reproducible.
This evidence has led to declarations in both the scientific and lay press that
science is experiencing a “reproducibility crisis” and that this crisis has
significant impacts on both science and society, including misdirected effort,
funding, and policy implemented on the basis of irreproducible research.
Franklin Sayre, Amy Riegelman (2018) C&RL 79(1) https://doi.org/10.5860/crl.79.1.2
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 4
This phenomenon appears, for example, in medicine, more precisely in
epidemiology, where, based on a large number of data (weight, age of the first
cigarette, etc.) and a large number of possible outcomes (breast cancer, lung
cancer, car accident, etc.), hazardous associations are made (a posteriori) and
statistically "validated".
p-hacking
p-hacking (also data dredging data fishing, data snooping, … ) is the misuse of
data analysis to find patterns in data that can be presented as statistically
significant when in fact there is no real underlying effect.
This is done by performing many statistical tests on the data and only paying
attention to those that come back with significant results, instead of stating a
single hypothesis about an underlying effect before the analysis and then
conducting a single test for it
https://en.wikipedia.org/wiki/Data_dredging
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 5
Cholesterol and Controversy: Past, present and Future
By Jeanne Garbarino on November 15, 2011
Scientific American - Blog
https://blogs.scientificamerican.com/guest-blog/cholesterol-
confusion-and-why-we-should-rethink-our-approach-to-statin-
therapy/
Cholesterol controversy
The French paradox: lessons for other countries
Heart. 2004 Jan; 90(1): 107–111.
doi: 10.1136/heart.90.1.107
Jean Ferrières
Plot of death rate from coronary heart disease (1977)
correlated with daily dietary intake (from 1976 to 1978) of
cholesterol and saturated fat as expressed by the
cholesterol fat index (CSI) per 1000 kcal
Correlation does not mean causal relationship !
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 6
Open Science
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
DATA Studies
Research Project
During a research project
Know-how knowledge
Input Output
7
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
What do they become?
• Nothing ! They rest on a disk space (up to its death!)
Among the possible scenarios, two of them are extreme
• Creation of a comprehensive database managing all
data and metadata in its entirety, associated with a
visualization and querying interface.
Expected objectives
After the project is completed
DATA Studies
8
Research Project
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
Expected objectives
Scientific Data Repositories
Enrichment
Expected links
DATA Studies
Publishing policies
…
9
https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-issues/open-access-dissemination_en.htm
Research Project
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
NATIONAL PLAN FOR OPEN SCIENCE
Open science is the practice of making research publications and
data freely available (transparency)
Open science seeks to create an ecosystem in which scientific
research is more cumulative (interdisciplinary)
Open science makes knowledge accessible to all (civic aspect)
Open science also drives scientific progress (reactivity)
Finally, open science fosters scientific integrity and people’s trust
in science (ethics)
http://cache.media.enseignementsup-recherche.gouv.fr/file/Recherche/50/1/SO_A4_2018_EN_01_leger_982501.pdf
announced by Frédérique Vidal on 4 July 2018
makes open access mandatory for publications and project-funded research data.
10
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 11
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 12
Interdisciplinary
Data
Science
Scientific
Field
IT
Skills
Data Management
Data InterpretationData Analysis
Open Science is a new research paradigm facing many challenges, mainly :
 Requirement of many skills
 the ingrained research habits
Statistics
Software Data
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
Science today - context
Knowledge creation
 Experimental science
 Theoretical science
 Data-intensive science /
Data-driven science
Requires three skills:
 Scientific field
 Information management
 Data processing
Research Paradigms
What are the
consequences on the
data?
Publications + Data
Not only induction, deduction
but above all abduction >> data science
New Paradigm
13
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 14
Abduction
Abduction is a type of reasoning consisting in inferring probable causes to
an observed fact.
In other words, it is a question of establishing a most probable cause of a
fact found …
… and stating, as a hypothesis, that the fact in question probably results
from that cause.
Data Science
Data-driven science
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
Data from observation, experimentation or derived from existing sources
that are analyzed in order to produce or validate research results original
What is the Research Data ?
Digital Data Tables, Text Files, Sound Recordings, Completed
Survey Questionnaires, Image or Video Database, Derived data or
compiled
“Data, or units of information, related to research activities, whether funded or
not, are often organized or formatted in such a way that they can be
communicated, interpreted and processed. Research Data are all the information
you use as part of your research “ according to the University of Bristol
15
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 16
“Data management should be woven into every course in science.”
Data's shameful neglect
Nature 461, 2009 (Editorial)
 orchestrates data for efficient and reliable use
 increases the impact of research,
 improves the visibility of research
 allows data to be shared securely
 makes it easy to find the data
 reduces the risk of data loss
 increases citation rates
 requirement of most funders and publishers
RDM benefits
Data Management Facilitates
Sharing and Re-use …
Why do we have to "manage" the Research Data
based on the Open Science paradigm ?
https://www.nature.com/articles/461145a
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
• Primary/secondary
• Experimental, observational, simulation, derived, compiled, canonical
• Raw, processed, aggregated, enriched, annotated, formatted, standardized, processed,
published
• Structured/unstructured, homogenous/heterogeneous
• Free / protected
Manage?... but manage what?
17
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 18
Data
Creation
Data
processing
Data
Analysis
Data
preservation
Data
dissemination
Re-Use
Data
Collection: experiments, measurements,
observations, simulations
Creation
of metadata
Enter, format, clean,
organize, verify, validate,
describe, store
Interpretation, visualization,
formatting, publication
Migration, reformatting,
back-up, permanent storage,
Metadata, documentation, certification
Distribution, referencing,
Reporting, rights management
Data journals
Teaching,
new research,
evaluation
Curation
of data
The data life cycle
Integrate scientific data
management into research
activities
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
IT Manager / System Administrator
«skilled partner» in data archiving and
preservation
Data Creator
people who produce digital data
Data Manager
expert on the management, reporting,
storage and dissemination of research data
Data Scientist
data analysis
A wide variety of fields
Rapid developments - Continuing training required
New jobs require more and more IT skills
Research Data Management
Support - skills and professions
The data life cycle
at each stage, services can be developed:
- development of Data Management Plan (DMP)
- identification of metadata describing the data
- selection of warehouses to store data
- data retention infrastructures
- data discovery and mining tools
- data reuse framework
The scientific data life cycle is the set of
stages of management, conservation,
dissemination and reuse of scientific
data related to research activities.
19
https://ec.europa.eu/research/openscience/pdf/os_skills_wgreport_final.pdf
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 https://www6.inra.fr/datapartage/
A data management plan or DMP is a formal document that outlines
how data will be obtained, processed, organized, stored, secured, preserved, shared
both during a research project, and after the project is completed.
The goal of a data management plan is to consider
the many aspects of data management, metadata generation, data preservation, and analysis
before the project begins
this ensures that data are well-managed
in the present, and prepared for preservation in the future.
Optimization of Data Sharing and
Interoperability of Research
https://dmp.opidor.fr/
Main step of data management
Tool to be used as soon as projects are set up
Data Management Plan (DMP)
20
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 21
Operational DetailsData Management Plan (DMP)
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 22
How does the
management of data is
it funded, especially in
the long term?
Resources
What does the project consist of?
Who are the partners?
What policy on data management?
Who is responsible for the
management of data?
Responsibilities
in the project
What data will be produced/used
during the course of the project
(type, format, volume and
increase...) ?
How will they be produced?
processed?
Data collection
How, where, where, by
whom, will be stored,
backed up and secured
the data?
Data backup
Data Management Plan (DMP)
Who will be able to access the
data? The data will they be shared?
published? With whom? How?
How long does it take? Under which
license?
Data Access and Data sharing
Who will own it?
of the data produced
External data
will they be used?
Intellectual Property
What is the plan for
long-term archiving and
preservation?
Data Archiving
How will the data be identified,
described? What metadata
standards will be used?
How will the metadata be
generated?
Data Documentation
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
Findable Accessible
Interoperable Reusable
Describe your data in a data repository
Apply a persistent identifier
Consider what will be shared
Obtain participant consent
Use open formats
Consistent vocabulary
Common metadata standards
Consider permitted use
Apply appropriate license
23
The FAIR Data Principles are a set of guiding principles to make data accessible, interoperable and
reusable (Wilkinson et al.,2016 Scientific Data - https://www.nature.com/articles/sdata201618).
https://www.force11.org/group/fairgroup/fairprinciples
RDM based on the Open Science : THE FAIR DATA PRINCIPLES
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 24
THE FAIR DATA PRINCIPLES
A1.2 => Open as much as possible, Close as much as necessary
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 25
THE FAIR DATA PRINCIPLES
5 ★ OPEN DATA
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 26
It is above all an approach to measure
the maturity of your data in relation to
Open DATA
THE FAIR DATA PRINCIPLES
https://www.go-fair.org/
From Principles towards Implementations
The Internet of FAIR Data & Services
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 27
DMP model H2020 based on FAIR principles
https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
Guidelines on FAIR Data Management in Horizon 2020
1. Data Summary
2. FAIR data
2.1. Making data findable, including provisions for metadata
2.2. Making data openly accessible
2.3. Making data interoperable
2.4. Increase data re-use (through clarifying licences)
3. Allocation of resources
4. Data security
5. Ethical aspects
6. Other issues
7. Further support in developing your DMP
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
Data on the web, open license
… in a structured format
… and non-proprietary format
… identified by URIs
… and related to others (data)
5 ★ OPEN DATA
Publish data "5 Gold stars"
Tim Berners-Lee, the inventor of the Web and Linked Data
initiator, suggested a 5-star deployment scheme for Open Data
28
K. Janowicz et al (2014) Five Stars of Linked Data Vocabulary Use
Semantic Web 0 (2014) 1–0
https://geog.ucsb.edu/~jano/swj653.pdf
See also
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
SERVICE DESCRIPTION
re3data is a global registry of research data repositories from a diverse range of academic disciplines.
It provides information on repositories for the permanent storage and access of data sets to
researchers, funding bodies, publishers and scholarly institutions.
Research Data Repositories are based on
web applications to preserve, share, cite, search and analyse research data.
…
https://data.inra.fr/
Science Europe’s Framework for Discipline-specific
Research Data Management
29
https://www.nature.com/sdata/policies/repositories
Recommended Data Repositories
https://fairsharing.org/databases/
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 30
https://data.inra.fr/
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 31
…
2,406 Data Repositories (Oct 10, 2019)
https://www.re3data.org/metrics
Not FAIR !!
FAIR ?
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 32
Reproducible Research
in the context of Open Science
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 33
 Some issues often arise with users jumping straight into software implementations of
methods (e.g. in R) that may lack documentation on biases and assumptions that are
mentioned in the original papers.
Halsey et al (2015) The fickle P value generates irreproducible results, Nature Methods 12, 179–185
Calls for Open Science & Reproducible Research
Typical examples of where problems can arise
 A major cause of lack of repeatability (often not being considered) is the wide sample-
to-sample variability in the P value. Due to that p-value is fickle, the interpreting of
analyses should not be based predominantly on this statistic.
 Overfitting a model is a condition where a statistical model begins to describe the
random error in the data rather than the relationships between variables. This
problem occurs when the model is too complex. In regression analysis, overfitting
can produce misleading R-squared values, regression coefficients, and p-values.
https://statisticsbyjim.com/regression/overfitting-regression-models/
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 34
Calls for Open Science & Reproducible Research
Others issues
 Loss of data and/or information :
 Not regularly backing up your data is considered as professional negligence
 Lack of knowledge, lack of technical skills, having more or less hazardous practices :
 Training is a right but also a duty to claim to fully assume a function / mission
 Continuous evolution of software libraries & their dependencies
 Problems related to digital accuracy from one computer to another,
 Versioning,
 …
Miscellaneous
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 35
“Citations to unpublished data and personal communications
cannot be used to support claims in a published paper”
“All data necessary to understand, assess, and extend the
conclusions of the manuscript must be available to any reader
of science.
What Science Requires
Calls for Open Science & Reproducible Research
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 36
Research is defined as reproducible when then published results
can be replicated using the documented data, code, and methods
employed by the author or provider without the need for any
additional information or needing to communicate with the author
or provider
Reproducible Research
https://nnlm.gov/data/thesaurus/reproducible-research
Reproducible research is
is not a guarantee of research quality, but a guarantee of transparency.
contributes to quality but does not replace it
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 37
Reproducibility has the potential to serve as a minimum standard for judging scientific
claims when full independent replication of a study is not possible
Reproducible Research
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 38
Reproducible Research
Good practices
 Data Collection and Management :
 Write an information collection protocol: this protocol should be part of the published article
 Maintain a laboratory notebook
 Collect data repeatedly AND reproducibly
 Research Compendium :
 facilitates reproducible research by bringing together in a single
virtual "place" the data, codes, protocols and documentation
related to a research project
 Full computational environment used to produce the results in the
paper such as the code, data, etc. that can be used to reproduce
the results and create new work based on the research.
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 39
Reproducible Research
Good practices
Manage what ? What kind of data/information ?
The minimal but mandatory set of files
From RAW DATA To Final results
Including
• Standard Operating Procedures (SOP)
• Data reporting
Checking
Validation
Tracing
Raw Data
Processed
data
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 40
Reproducible Research
Good practices
The minimal but mandatory set of files
Checking
Validation
Tracing
The final
quantification
results file
The calibration file
(Calibration curves based on
standard compounds)
The Excel worksheet(s)
having served to calculate
the quantification
The compound
attribution zones
An image of an annotated
NMR spectrum
Protocol documents that describe each step of the process (Quality Assurance):
I. Analytical sample preparation
II. Analytical processing
III. Data processing
IV. Quantification
The raw
NMR
spectra
(ZIP file)
Example: 1H-NMR Analytical Technique
http://nmrprocflow.org/ex1
Example of full 1H-NMR data set
Manage what ? What kind of data/information ?
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 41
Reproducible Research
Good practices
 Backups :
 Not regularly backing up your data is considered as professional negligence
 Versions and Archives :
 Safeguarding the successive stages of document development (texts, data, codes, etc.) is one of
the fundamental building blocks of reproducible research
 Implementation of a version management strategy
 Git + local or institutional Forge (i.e. Forgemia), GitHub (i.e. github/INRA)
 Research data repositories (re3data.org)
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 42
Reproducible Research
Good advices
 Data exploration
 Use tools that you know well or that allow you to gain in efficiency.
But
 Learn to program :
 Limit the use of graphical interfaces (GUI) for subtle or repetitive tasks
 Be able to express in a clear, documented and unambiguous way what you want the software to do
 A program can be simply expressed in a few lines only. The higher the level of language used, the less
there will be to write.
 Typical examples of reproducible research comprise compendia of data, code and text files, often
organised around an R Markdown source document or a Jupyter notebook.
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 43
Open Data for Access and Mining
ODAM Framework
Example of a Data Management System in the context of Open Science
http://pmb-bordeaux.fr/dataexplorer/
http://pmb-bordeaux.fr/odam/FAIR_and_DataLife_DJ_Oct2019.pdf
https://nbviewer.jupyter.org/github/djacob65/binder_odam/blob/master/PyODAM_api_PCA.ipynb
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019
https://doranum.fr/
Research Data - Digital Learning
https://coop-ist.cirad.fr/gerer-des-donnees
CoopIST – Cooperate in Scientific and Technical Information
INRA services and resources
https://www6.inra.fr/datapartage
Some useful links related to Open Science / Data Management
The future of science is Open
https://www.fosteropenscience.eu/
Building the social and technical bridges to enable open sharing and re-use of data
https://www.rd-alliance.org/ 23 Things: Libraries for Research Data
44
Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 45
Vers une recherche reproductible : Faire évoluer ses pratiques
https://hal.archives-ouvertes.fr/hal-02144142v1
https://englianhu.files.wordpress.com/2016/01/reproducible-research-with-r-and-studio-2nd-edition.pdf
Reproducible Research with R and RStudio Second Edition
Reproducibility and Replicability in Science
https://www.nap.edu/catalog/25303/reproducibility-and-replicability-in-science
Books online related to Reproducible Research

Contenu connexe

Tendances

Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementCunera Buys
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Leon Osinski
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementcunera
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
Data management (newest version)
Data management (newest version)Data management (newest version)
Data management (newest version)Graça Gabriel
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE
 
Computational Research day 2015
Computational Research day 2015Computational Research day 2015
Computational Research day 2015cunera
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 

Tendances (20)

Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
Preparing Your Research Data for the Future - 2015-06-08 - Medical Sciences D...
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
Workingwith dataverserepository
Workingwith dataverserepositoryWorkingwith dataverserepository
Workingwith dataverserepository
 
Setting up a data repository, what does it entail?
Setting up a data repository, what does it entail?Setting up a data repository, what does it entail?
Setting up a data repository, what does it entail?
 
Getting data into the data repository
Getting data into the data repositoryGetting data into the data repository
Getting data into the data repository
 
Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...Research data management : Open Research Data pilot, data management (plans),...
Research data management : Open Research Data pilot, data management (plans),...
 
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Research Data Management: How will Northwestern address new sharing requireme...
Research Data Management: How will Northwestern address new sharing requireme...Research Data Management: How will Northwestern address new sharing requireme...
Research Data Management: How will Northwestern address new sharing requireme...
 
Data management (newest version)
Data management (newest version)Data management (newest version)
Data management (newest version)
 
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
 
RDM & ELNs @ Edinburgh
RDM & ELNs @ EdinburghRDM & ELNs @ Edinburgh
RDM & ELNs @ Edinburgh
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management Planning
 
Computational Research day 2015
Computational Research day 2015Computational Research day 2015
Computational Research day 2015
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
The Donders Repository
The Donders RepositoryThe Donders Repository
The Donders Repository
 

Similaire à Research Data Management

The FOSTER project - general overview
The FOSTER project - general overviewThe FOSTER project - general overview
The FOSTER project - general overviewMartin Donnelly
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Martin Donnelly
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigmstarastar
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...African Open Science Platform
 
I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17Tom Nyongesa
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesMartin Donnelly
 
Mind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeMind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeLizLyon
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Academy of Science of South Africa (ASSAf)
 
Research Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staffResearch Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staffMartin Donnelly
 
Digital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchMartin Donnelly
 
Digital Resources for Open Science
Digital Resources for Open ScienceDigital Resources for Open Science
Digital Resources for Open ScienceMartin Donnelly
 
Museum collections as research data - October 2019
Museum collections as research data - October 2019Museum collections as research data - October 2019
Museum collections as research data - October 2019Dag Endresen
 
ischools future of data managemente dec2017
ischools future of data managemente dec2017ischools future of data managemente dec2017
ischools future of data managemente dec2017ARDC
 
Winning Horizon 2020 with Open Science
Winning Horizon 2020 with Open ScienceWinning Horizon 2020 with Open Science
Winning Horizon 2020 with Open ScienceMartin Donnelly
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)dri_ireland
 
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...African Open Science Platform
 

Similaire à Research Data Management (20)

Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
The FOSTER project - general overview
The FOSTER project - general overviewThe FOSTER project - general overview
The FOSTER project - general overview
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms:
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practices
 
Mind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and PracticeMind the Gap: Reflections on Data Policies and Practice
Mind the Gap: Reflections on Data Policies and Practice
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
 
Research Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staffResearch Data Management: a gentle introduction for admin staff
Research Data Management: a gentle introduction for admin staff
 
Digital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening Research
 
Digital Resources for Open Science
Digital Resources for Open ScienceDigital Resources for Open Science
Digital Resources for Open Science
 
Museum collections as research data - October 2019
Museum collections as research data - October 2019Museum collections as research data - October 2019
Museum collections as research data - October 2019
 
The State of Open Data Report by @figshare
The State of Open Data Report  by @figshareThe State of Open Data Report  by @figshare
The State of Open Data Report by @figshare
 
ischools future of data managemente dec2017
ischools future of data managemente dec2017ischools future of data managemente dec2017
ischools future of data managemente dec2017
 
Winning Horizon 2020 with Open Science
Winning Horizon 2020 with Open ScienceWinning Horizon 2020 with Open Science
Winning Horizon 2020 with Open Science
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
 
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
 

Plus de Daniel JACOB

Indexator_oct2022.pdf
Indexator_oct2022.pdfIndexator_oct2022.pdf
Indexator_oct2022.pdfDaniel JACOB
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2Daniel JACOB
 
Make your data great now
Make your data great nowMake your data great now
Make your data great nowDaniel JACOB
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 

Plus de Daniel JACOB (6)

Indexator_oct2022.pdf
Indexator_oct2022.pdfIndexator_oct2022.pdf
Indexator_oct2022.pdf
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
 
Biostatflow
BiostatflowBiostatflow
Biostatflow
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
ERVA-NMR
ERVA-NMRERVA-NMR
ERVA-NMR
 

Dernier

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Dernier (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

Research Data Management

  • 1. 1 Research Data Management Open Science Daniel Jacob INRA UMR 1332 BFP – Metabolism Group Bordeaux Metabolomics Facility
  • 2. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 2 • Links between Research Data and Open Science • How the management and preservation of Research Data can facilitate the work of researchers • How to address concerns about Data Sharing • The research Data life cycle At the end of the course you should understand...
  • 3. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 3 The Reproducibility Crisis In recent years, evidence has emerged from disciplines ranging from biology to economics that many scientific studies are not reproducible. This evidence has led to declarations in both the scientific and lay press that science is experiencing a “reproducibility crisis” and that this crisis has significant impacts on both science and society, including misdirected effort, funding, and policy implemented on the basis of irreproducible research. Franklin Sayre, Amy Riegelman (2018) C&RL 79(1) https://doi.org/10.5860/crl.79.1.2
  • 4. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 4 This phenomenon appears, for example, in medicine, more precisely in epidemiology, where, based on a large number of data (weight, age of the first cigarette, etc.) and a large number of possible outcomes (breast cancer, lung cancer, car accident, etc.), hazardous associations are made (a posteriori) and statistically "validated". p-hacking p-hacking (also data dredging data fishing, data snooping, … ) is the misuse of data analysis to find patterns in data that can be presented as statistically significant when in fact there is no real underlying effect. This is done by performing many statistical tests on the data and only paying attention to those that come back with significant results, instead of stating a single hypothesis about an underlying effect before the analysis and then conducting a single test for it https://en.wikipedia.org/wiki/Data_dredging
  • 5. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 5 Cholesterol and Controversy: Past, present and Future By Jeanne Garbarino on November 15, 2011 Scientific American - Blog https://blogs.scientificamerican.com/guest-blog/cholesterol- confusion-and-why-we-should-rethink-our-approach-to-statin- therapy/ Cholesterol controversy The French paradox: lessons for other countries Heart. 2004 Jan; 90(1): 107–111. doi: 10.1136/heart.90.1.107 Jean Ferrières Plot of death rate from coronary heart disease (1977) correlated with daily dietary intake (from 1976 to 1978) of cholesterol and saturated fat as expressed by the cholesterol fat index (CSI) per 1000 kcal Correlation does not mean causal relationship !
  • 6. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 6 Open Science
  • 7. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 DATA Studies Research Project During a research project Know-how knowledge Input Output 7
  • 8. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 What do they become? • Nothing ! They rest on a disk space (up to its death!) Among the possible scenarios, two of them are extreme • Creation of a comprehensive database managing all data and metadata in its entirety, associated with a visualization and querying interface. Expected objectives After the project is completed DATA Studies 8 Research Project
  • 9. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 Expected objectives Scientific Data Repositories Enrichment Expected links DATA Studies Publishing policies … 9 https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-issues/open-access-dissemination_en.htm Research Project
  • 10. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 NATIONAL PLAN FOR OPEN SCIENCE Open science is the practice of making research publications and data freely available (transparency) Open science seeks to create an ecosystem in which scientific research is more cumulative (interdisciplinary) Open science makes knowledge accessible to all (civic aspect) Open science also drives scientific progress (reactivity) Finally, open science fosters scientific integrity and people’s trust in science (ethics) http://cache.media.enseignementsup-recherche.gouv.fr/file/Recherche/50/1/SO_A4_2018_EN_01_leger_982501.pdf announced by Frédérique Vidal on 4 July 2018 makes open access mandatory for publications and project-funded research data. 10
  • 11. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 11
  • 12. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 12 Interdisciplinary Data Science Scientific Field IT Skills Data Management Data InterpretationData Analysis Open Science is a new research paradigm facing many challenges, mainly :  Requirement of many skills  the ingrained research habits Statistics Software Data
  • 13. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 Science today - context Knowledge creation  Experimental science  Theoretical science  Data-intensive science / Data-driven science Requires three skills:  Scientific field  Information management  Data processing Research Paradigms What are the consequences on the data? Publications + Data Not only induction, deduction but above all abduction >> data science New Paradigm 13
  • 14. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 14 Abduction Abduction is a type of reasoning consisting in inferring probable causes to an observed fact. In other words, it is a question of establishing a most probable cause of a fact found … … and stating, as a hypothesis, that the fact in question probably results from that cause. Data Science Data-driven science
  • 15. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 Data from observation, experimentation or derived from existing sources that are analyzed in order to produce or validate research results original What is the Research Data ? Digital Data Tables, Text Files, Sound Recordings, Completed Survey Questionnaires, Image or Video Database, Derived data or compiled “Data, or units of information, related to research activities, whether funded or not, are often organized or formatted in such a way that they can be communicated, interpreted and processed. Research Data are all the information you use as part of your research “ according to the University of Bristol 15
  • 16. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 16 “Data management should be woven into every course in science.” Data's shameful neglect Nature 461, 2009 (Editorial)  orchestrates data for efficient and reliable use  increases the impact of research,  improves the visibility of research  allows data to be shared securely  makes it easy to find the data  reduces the risk of data loss  increases citation rates  requirement of most funders and publishers RDM benefits Data Management Facilitates Sharing and Re-use … Why do we have to "manage" the Research Data based on the Open Science paradigm ? https://www.nature.com/articles/461145a
  • 17. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 • Primary/secondary • Experimental, observational, simulation, derived, compiled, canonical • Raw, processed, aggregated, enriched, annotated, formatted, standardized, processed, published • Structured/unstructured, homogenous/heterogeneous • Free / protected Manage?... but manage what? 17
  • 18. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 18 Data Creation Data processing Data Analysis Data preservation Data dissemination Re-Use Data Collection: experiments, measurements, observations, simulations Creation of metadata Enter, format, clean, organize, verify, validate, describe, store Interpretation, visualization, formatting, publication Migration, reformatting, back-up, permanent storage, Metadata, documentation, certification Distribution, referencing, Reporting, rights management Data journals Teaching, new research, evaluation Curation of data The data life cycle Integrate scientific data management into research activities
  • 19. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 IT Manager / System Administrator «skilled partner» in data archiving and preservation Data Creator people who produce digital data Data Manager expert on the management, reporting, storage and dissemination of research data Data Scientist data analysis A wide variety of fields Rapid developments - Continuing training required New jobs require more and more IT skills Research Data Management Support - skills and professions The data life cycle at each stage, services can be developed: - development of Data Management Plan (DMP) - identification of metadata describing the data - selection of warehouses to store data - data retention infrastructures - data discovery and mining tools - data reuse framework The scientific data life cycle is the set of stages of management, conservation, dissemination and reuse of scientific data related to research activities. 19 https://ec.europa.eu/research/openscience/pdf/os_skills_wgreport_final.pdf
  • 20. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 https://www6.inra.fr/datapartage/ A data management plan or DMP is a formal document that outlines how data will be obtained, processed, organized, stored, secured, preserved, shared both during a research project, and after the project is completed. The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins this ensures that data are well-managed in the present, and prepared for preservation in the future. Optimization of Data Sharing and Interoperability of Research https://dmp.opidor.fr/ Main step of data management Tool to be used as soon as projects are set up Data Management Plan (DMP) 20
  • 21. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 21 Operational DetailsData Management Plan (DMP)
  • 22. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 22 How does the management of data is it funded, especially in the long term? Resources What does the project consist of? Who are the partners? What policy on data management? Who is responsible for the management of data? Responsibilities in the project What data will be produced/used during the course of the project (type, format, volume and increase...) ? How will they be produced? processed? Data collection How, where, where, by whom, will be stored, backed up and secured the data? Data backup Data Management Plan (DMP) Who will be able to access the data? The data will they be shared? published? With whom? How? How long does it take? Under which license? Data Access and Data sharing Who will own it? of the data produced External data will they be used? Intellectual Property What is the plan for long-term archiving and preservation? Data Archiving How will the data be identified, described? What metadata standards will be used? How will the metadata be generated? Data Documentation
  • 23. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 Findable Accessible Interoperable Reusable Describe your data in a data repository Apply a persistent identifier Consider what will be shared Obtain participant consent Use open formats Consistent vocabulary Common metadata standards Consider permitted use Apply appropriate license 23 The FAIR Data Principles are a set of guiding principles to make data accessible, interoperable and reusable (Wilkinson et al.,2016 Scientific Data - https://www.nature.com/articles/sdata201618). https://www.force11.org/group/fairgroup/fairprinciples RDM based on the Open Science : THE FAIR DATA PRINCIPLES
  • 24. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 24 THE FAIR DATA PRINCIPLES A1.2 => Open as much as possible, Close as much as necessary
  • 25. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 25 THE FAIR DATA PRINCIPLES 5 ★ OPEN DATA
  • 26. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 26 It is above all an approach to measure the maturity of your data in relation to Open DATA THE FAIR DATA PRINCIPLES https://www.go-fair.org/ From Principles towards Implementations The Internet of FAIR Data & Services
  • 27. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 27 DMP model H2020 based on FAIR principles https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf Guidelines on FAIR Data Management in Horizon 2020 1. Data Summary 2. FAIR data 2.1. Making data findable, including provisions for metadata 2.2. Making data openly accessible 2.3. Making data interoperable 2.4. Increase data re-use (through clarifying licences) 3. Allocation of resources 4. Data security 5. Ethical aspects 6. Other issues 7. Further support in developing your DMP
  • 28. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 Data on the web, open license … in a structured format … and non-proprietary format … identified by URIs … and related to others (data) 5 ★ OPEN DATA Publish data "5 Gold stars" Tim Berners-Lee, the inventor of the Web and Linked Data initiator, suggested a 5-star deployment scheme for Open Data 28 K. Janowicz et al (2014) Five Stars of Linked Data Vocabulary Use Semantic Web 0 (2014) 1–0 https://geog.ucsb.edu/~jano/swj653.pdf See also
  • 29. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 SERVICE DESCRIPTION re3data is a global registry of research data repositories from a diverse range of academic disciplines. It provides information on repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions. Research Data Repositories are based on web applications to preserve, share, cite, search and analyse research data. … https://data.inra.fr/ Science Europe’s Framework for Discipline-specific Research Data Management 29 https://www.nature.com/sdata/policies/repositories Recommended Data Repositories https://fairsharing.org/databases/
  • 30. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 30 https://data.inra.fr/
  • 31. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 31 … 2,406 Data Repositories (Oct 10, 2019) https://www.re3data.org/metrics Not FAIR !! FAIR ?
  • 32. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 32 Reproducible Research in the context of Open Science
  • 33. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 33  Some issues often arise with users jumping straight into software implementations of methods (e.g. in R) that may lack documentation on biases and assumptions that are mentioned in the original papers. Halsey et al (2015) The fickle P value generates irreproducible results, Nature Methods 12, 179–185 Calls for Open Science & Reproducible Research Typical examples of where problems can arise  A major cause of lack of repeatability (often not being considered) is the wide sample- to-sample variability in the P value. Due to that p-value is fickle, the interpreting of analyses should not be based predominantly on this statistic.  Overfitting a model is a condition where a statistical model begins to describe the random error in the data rather than the relationships between variables. This problem occurs when the model is too complex. In regression analysis, overfitting can produce misleading R-squared values, regression coefficients, and p-values. https://statisticsbyjim.com/regression/overfitting-regression-models/
  • 34. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 34 Calls for Open Science & Reproducible Research Others issues  Loss of data and/or information :  Not regularly backing up your data is considered as professional negligence  Lack of knowledge, lack of technical skills, having more or less hazardous practices :  Training is a right but also a duty to claim to fully assume a function / mission  Continuous evolution of software libraries & their dependencies  Problems related to digital accuracy from one computer to another,  Versioning,  … Miscellaneous
  • 35. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 35 “Citations to unpublished data and personal communications cannot be used to support claims in a published paper” “All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of science. What Science Requires Calls for Open Science & Reproducible Research
  • 36. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 36 Research is defined as reproducible when then published results can be replicated using the documented data, code, and methods employed by the author or provider without the need for any additional information or needing to communicate with the author or provider Reproducible Research https://nnlm.gov/data/thesaurus/reproducible-research Reproducible research is is not a guarantee of research quality, but a guarantee of transparency. contributes to quality but does not replace it
  • 37. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 37 Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible Reproducible Research
  • 38. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 38 Reproducible Research Good practices  Data Collection and Management :  Write an information collection protocol: this protocol should be part of the published article  Maintain a laboratory notebook  Collect data repeatedly AND reproducibly  Research Compendium :  facilitates reproducible research by bringing together in a single virtual "place" the data, codes, protocols and documentation related to a research project  Full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.
  • 39. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 39 Reproducible Research Good practices Manage what ? What kind of data/information ? The minimal but mandatory set of files From RAW DATA To Final results Including • Standard Operating Procedures (SOP) • Data reporting Checking Validation Tracing Raw Data Processed data
  • 40. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 40 Reproducible Research Good practices The minimal but mandatory set of files Checking Validation Tracing The final quantification results file The calibration file (Calibration curves based on standard compounds) The Excel worksheet(s) having served to calculate the quantification The compound attribution zones An image of an annotated NMR spectrum Protocol documents that describe each step of the process (Quality Assurance): I. Analytical sample preparation II. Analytical processing III. Data processing IV. Quantification The raw NMR spectra (ZIP file) Example: 1H-NMR Analytical Technique http://nmrprocflow.org/ex1 Example of full 1H-NMR data set Manage what ? What kind of data/information ?
  • 41. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 41 Reproducible Research Good practices  Backups :  Not regularly backing up your data is considered as professional negligence  Versions and Archives :  Safeguarding the successive stages of document development (texts, data, codes, etc.) is one of the fundamental building blocks of reproducible research  Implementation of a version management strategy  Git + local or institutional Forge (i.e. Forgemia), GitHub (i.e. github/INRA)  Research data repositories (re3data.org)
  • 42. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 42 Reproducible Research Good advices  Data exploration  Use tools that you know well or that allow you to gain in efficiency. But  Learn to program :  Limit the use of graphical interfaces (GUI) for subtle or repetitive tasks  Be able to express in a clear, documented and unambiguous way what you want the software to do  A program can be simply expressed in a few lines only. The higher the level of language used, the less there will be to write.  Typical examples of reproducible research comprise compendia of data, code and text files, often organised around an R Markdown source document or a Jupyter notebook.
  • 43. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 43 Open Data for Access and Mining ODAM Framework Example of a Data Management System in the context of Open Science http://pmb-bordeaux.fr/dataexplorer/ http://pmb-bordeaux.fr/odam/FAIR_and_DataLife_DJ_Oct2019.pdf https://nbviewer.jupyter.org/github/djacob65/binder_odam/blob/master/PyODAM_api_PCA.ipynb
  • 44. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 https://doranum.fr/ Research Data - Digital Learning https://coop-ist.cirad.fr/gerer-des-donnees CoopIST – Cooperate in Scientific and Technical Information INRA services and resources https://www6.inra.fr/datapartage Some useful links related to Open Science / Data Management The future of science is Open https://www.fosteropenscience.eu/ Building the social and technical bridges to enable open sharing and re-use of data https://www.rd-alliance.org/ 23 Things: Libraries for Research Data 44
  • 45. Daniel Jacob – INRA UMR 1332 BFP – Oct 2019 45 Vers une recherche reproductible : Faire évoluer ses pratiques https://hal.archives-ouvertes.fr/hal-02144142v1 https://englianhu.files.wordpress.com/2016/01/reproducible-research-with-r-and-studio-2nd-edition.pdf Reproducible Research with R and RStudio Second Edition Reproducibility and Replicability in Science https://www.nap.edu/catalog/25303/reproducibility-and-replicability-in-science Books online related to Reproducible Research