EUDAT Research Data Management | www.eudat.eu |

EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 www.eudat.eu
Research Data Management
Version 2
August 2016
This work is licensed under the Creative
Commons CC-BY 4.0 licence

The changing data landscape
Managing and sharing research data
EUDAT services
Overview

THE CHANGING DATA LANDSCAPE
Image CC-BY-SA ‘data.path Ryoji.Ikeda - 3’ by r2hox www.flickr.com/photos/rh2ox/9990016123

Data explosion
More and more data is
being created
Issue is not creating data,
but being able to navigate
and use it
Data management is
critical to make sure data
are well-organised,
understandable and
reusable
Image by ‘Coupmedia’ by http://www.coupmedia.com/resources/

Digital data are fragile and susceptible to loss for a wide variety of reasons
Natural disaster
Facilities infrastructure failure
Storage failure
Server hardware/software failure
Application software failure
Format obsolescence
Legal encumbrance
Human error
Malicious attack
Loss of staffing competencies
Loss of institutional commitment
Loss of financial stability
Changes in user expectations
Data loss
Image CC-BY ‘Hard Drive 016’ by Jon Ross www.flickr.com/photos/jon_a_ross/1482849745

Link rot – more 404 errors
generated over time
Reference rot* – link rot
plus content drift i.e.
webpages evolving and no
longer reflecting original
content cited
* Term coined by Hiberlink http://hiberlink.org
Data persistency issues
Jonathan D. Wren Bioinformatics 2008;24:1381-1385

A reproducibility crisis
Nature special issue
http://www.nature.com/news
/reproducibility-1.17552
Several studies have shown
alarming numbers of
published papers that don’t
stand up to scrutiny

A wildlife biologist for a small field office was the in-house GIS expert
and provided support for all the staff’s GIS needs. However, the data
was stored on her own workstation. When
the biologist relocated to another office, no one understood how
the data was stored or managed.
Solution: A state office GIS specialist retrieved the workstation
and sifted through files trying to salvage relevant data.
Cost: 1 work month ($4,000) plus the value of data that was not
recovered
Consider that the situation could have been worse, because the data
was not being backed up as it would have been if stored on a server.
Poor data management - science example

In preparation for a Resource Management Plan, an office
discovered 14 duplicate GPS inventories of roads.
However, because none of the inventories had enough
metadata, it was impossible to know which inventory was
best or if any of the inventories actually met their
requirements.
Solution: Re-Inventory roads
Cost: Estimated 9 work months
per inventory @$4,000/wm
(14 inventories = $504,000)
Poor data management - federal example
Image CC-BY ‘Minature fake highway interchange in Chicago’ by Ryan www.flickr.com/photos/ryanready/4692092024

Why manage research data?
To make your research easier!
To stop yourself drowning in irrelevant stuff
In case you need the data later
To avoid accusations of fraud or bad science
To share your data for others to use and learn from
To get credit for producing it
Because funders or your organisation require it
Well-managed data opens up opportunities for re-
use, integration and new science

MANAGING & SHARING DATA
Image CC-BY-SA by https://www.flickr.com/photos/notbrucelee/8016192302

CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA
Research data lifecycle
CREATING DATA: designing research,
DMPs, planning consent, locate existing
data, data collection and management,
capturing and creating metadata
RE-USING DATA: follow-
up research, new
research, undertake
research reviews,
scrutinising findings,
teaching & learning
ACCESS TO DATA:
distributing data,
sharing data,
controlling access,
establishing copyright,
promoting data PRESERVING DATA: data storage, back-
up & archiving, migrating to best format
& medium, creating metadata and
documentation
ANALYSING DATA:
interpreting, & deriving
data, producing outputs,
authoring publications,
preparing for sharing
PROCESSING DATA:
entering, transcribing,
checking, validating and
cleaning data, anonymising
data, describing data,
manage and store data
Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cycle

Bitstream
Persistent Identifier
Metadata
Digital objects can be
aggregated to digital
collections
What is a digital object?
https://b2share.eudat.eu/record/1

A DMP is a brief plan to define:
• how the data will be created?
• how it will be documented?
• who will access it?
• where it will be stored?
• who will back it up?
• whether (and how) it will be shared & preserved?
DMPs are often submitted as part of grant applications, but
are useful whenever researchers are creating data.
Data Management Planning

Metadata and documentation is needed to locate and
understand research data
Think about what others would need in order to find,
evaluate, understand, and reuse your data.
Get others to check the metadata to improve quality
Use standards to enable interoperability
Metadata and documentation

Where to store your data?
Your own drive (PC, server, flash drive, etc.)
– And if you lose it? Or it breaks?
Somebody else’s drive / departmental drive
“Cloud” drive
– Do they care as much about your data as you do?
Large scale infrastructure services like EUDAT

How to backup?
3... 2... 1... backup!
– at least 3 copies of a file
– on at least 2 different media
– with at least 1 offsite
Use managed services where possible e.g.
University filestores or infrastructure services
like EUDAT rather than local or external hard
drives
Ask IT teams for advice

Backup and preservation
– not the same thing!
Backups
o Used to take periodic snapshots of data in case the current
version is destroyed or lost
o Backups are copies of files stored for short or near-long-
term
o Often performed on a somewhat frequent schedule
Archiving
o Used to preserve data for historical reference or potentially
during disasters
o Archives are usually the final version, stored for long-term,
and generally not copied over
o Often performed at the end of a project or during major
milestones

A mistake in a spreadsheet led
to dramatically different results
from those published.
These results were cited by
the International Monetary
Fund and the UK Treasury to
justify austerity programmes.
Had the data been shared, this
could have been picked up
earlier.
The importance of sharing data

Concerns About Data Sharing
Concern Solution
inappropriate use due to
misunderstanding of research
purpose or parameters
security and confidentiality of
sensitive data
lack of acknowledgement / credit
loss of advantage when competing
for research dollars

Concern Solution
sensitive data
loss of advantage when competing
for research dollars
metadata
metadata
metadata
metadata

Concern Solution
provide rich Abstract, Purpose,
Use Constraints and Supplemental
Information where needed
sensitive data
• the metadata does NOT
contain the data
• Use Constraints specify who
may access the data and how
specify a required data citation
within the Use Constraints
loss data insight and competitive
advantage when vying for
research dollars
create second, public version with
generalized Data Processing
Description

Making data shareable
Create robust metadata that has been checked
Include reference information e.g. unique IDs & properly
formatted data citations
Publish your metadata so it’s discoverable. Use portals,
clearing houses, online resources…
Package up the data and associated metadata to deposit
in repositories

Deciding what to preserve and share
It’s not possible to keep everything. Select based on:
What has to be kept e.g. data underlying publications
What can’t be recreated e.g. environmental recordings
What is potentially useful to others
What has scientific, cultural or historical value
What legally must be destroyed
How to select and appraise research data:
www.dcc.ac.uk/resources/how-guides/appraise-select-research-data

EUDAT SERVICE SUITE
Image CC-BY-NC ‘Data centre’ by Bob Mical www.flickr.com/photos/small_realm/15995555571

EUDAT services
EUDAT offers a pan-European solution, providing a
generic set of services to ensure minimum level of
interoperability
Building common
data services in
close collaboration
with 25+
communities

EUDAT B2 service suite
Covering both access and
deposit, from informal data
sharing to long-term
archiving, and addressing
identification, discoverability
and computability of both
long-tail and big data,
EUDAT’s services will
address the full lifecycle of
research data

Support throughout the lifecycle
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
GIVING
ACCESS TO
DATA
RE-USING
DATA

www.eudat.eu
Authors Contributors
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Sarah Jones, Digital Curation
Centre
Mark van de Sanden, SURFsara
Thank you
Content has also been repurposed from the DataONE Educational
modules, ‘Data Management’ and ‘Data Sharing’ Retrieved from
https://www.dataone.org/education-modules

EUDAT Research Data Management | www.eudat.eu |

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (13)

Similaire à EUDAT Research Data Management | www.eudat.eu |

Similaire à EUDAT Research Data Management | www.eudat.eu | (20)

Plus de EUDAT

Plus de EUDAT (20)

Dernier

Dernier (20)

EUDAT Research Data Management | www.eudat.eu |

Notes de l'éditeur