SlideShare a Scribd company logo
1 of 47
Managing and Sharing Research Data
Sarah Jones
DCC, University of Glasgow
sarah.jones@glasgow.ac.uk
Twitter: @sjDCC
#fosteropenscience
Open Science Days 2015, 21st & 23rd April, Prague & Brno http://foster.czu.cz/?r=6661
RDM, OPEN SCIENCE, OPEN DATA...
Some definitions and clarifications
Image CC-BY-NC-SA by Tom Magllery www.flickr.com/photos/lwr/13442910354
What is Research Data Management?
The active management of data throughout the lifecycle
Create
Document
Use
Store
Share
Preserve
• Data Management Planning
• Creating data
• Documenting data
• Accessing / using data
• Storage and backup
• Selecting what to keep
• Sharing data
• Data licensing and citation
• Preserving data
• …
CC-BY-NC-SACC-BY-NC-SA
What is open science?
“science carried out and communicated in a
manner which allows others to contribute,
collaborate and add to the research effort, with
all kinds of data, results and protocols made
freely available at different stages of the
research process.”
Research Information Network, Open Science case studies
www.rin.ac.uk/our-work/data-management-and-curation/
open-science-case-studies
Open data
 make your stuff available on the Web (whatever format) under an open licence
 make it available as structured data (e.g. Excel instead of a scan of a table)
 use non-proprietary formats (e.g. CSV instead of Excel)
 use URIs to denote things, so that people can point at your stuff
 link your data to other data to provide context
Tim Berners-Lee’s proposal for five star open data - http://5stardata.info
“Open data and content can be freely used,
modified and shared by anyone for any purpose”
http://opendefinition.org
Data sharing: degrees of openness
Open Restricted Closed
Content that can be
freely used, modified
and shared by anyone
for any purpose
Limits on who can use the data,
how or for what purpose
- Charges for use
- Data sharing agreements
- Restrictive licences
- Peer-to-peer exchange
- …
Five star open data

Unable to share
Under embargo
WHY MANAGE AND SHARE DATA?
Benefits and drivers
Image CC-BY-NC-SA by wonderwebby www.flickr.com/photos/wonderwebby/2723279491
Why YOU need a Data
Management Plan
http://blogs.ch.cam.ac.uk/pmr/2
011/08/01/why-you-need-a-data-
management-plan
What if this was your laptop?
Data sharing boosts citation
A study that analysed the citation counts of 10,555 papers on gene
expression studies that created microarray data, showed:
“studies that made data available in a public repository
received 9% more citations than similar studies for
which the data was not made available”
Data reuse and the open data citation advantage,
Piwowar, H. & Vision, T. https://peerj.com/articles/175
Returns for institutions
“If an institution spent A$10 million on data, what would
be the return? The answer is: more publications; an
increased citation count; more grants; greater profile;
and more collaboration.”
Dr Ross Wilkinson, ANDS
www.ariadne.ac.uk/issue72/oar-2013-rpt
Expectations of public access
“Publicly funded research data are a public good,
produced in the public interest, which should be made
openly available with as few restrictions as possible in a
timely and responsible manner that does not harm
intellectual property.”
RCUK Common Principles on Data Policy
www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
Funder imperatives...
“The European Commission’s vision is
that information already paid for by the
public purse should not be paid for
again each time it is accessed or used,
and that it should benefit European
companies and citizens to the full.”
http://ec.europa.eu/research/participants/data/
ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-
oa-pilot-guide_en.pdf
Increased use and economic benefit
Up to 2008
• Sold through the US Geological
Survey for US$600 per scene
• Sales of 19,000 scenes per year
• Annual revenue of $11.4 million
Since 2009
• Freely available over the internet
• Google Earth now uses the images
• Transmission of 2,100,000
scenes per year.
• Estimated to have created value for
the environmental management
industry of $935 million, with direct
benefit of more than $100 million
per year to the US economy
• Has stimulated the development of
applications from a large number of
companies worldwide
The case of NASA Landsat satellite imagery of the Earth’s surface:
http://earthobservatory.nasa.gov/IOTD/view.php?id=83394&src=ve
Validation of results
www.guardian.co.uk/politics/2013/apr/18/uncovered-error-george-osborne-austerity
“It was a mistake in a spreadsheet that could
have been easily overlooked: a few rows left
out of an equation to average the values in a
column.
The spreadsheet was used to draw the
conclusion of an influential 2010 economics
paper: that public debt of more than 90% of
GDP slows down growth. This conclusion was
later cited by the International Monetary Fund
and the UK Treasury to justify programmes
of austerity that have arguably led to riots,
poverty and lost jobs.”
More scientific breakthroughs
www.nytimes.com/2010/08/13/health/research/13alzheimer.html?pagewanted=all&_r=0
“It was unbelievable. Its not science
the way most of us have practiced in
our careers. But we all realised that
we would never get biomarkers unless
all of us parked our egos and
intellectual property noses outside
the door and agreed that all of our
data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
Reasons to manage and share data
Direct benefits for you
• To make your research
easier!
• Stop yourself drowning in
irrelevant stuff
• Make sure you can
understand and reuse
your data again later
• Advance your career –
data is growing in
significance
Research integrity
• To avoid accusations of
fraud or bad science
• Evidence findings and
enable validation of
research methods
• Meet codes of practice
on research conduct
• Many research funders
worldwide now require
Data Management and
Sharing Plans
Potential to share data
• So others can reuse and
build on your data
• To gain credit – several
studies have shown
higher citation rates
when data are shared
• For greater visibility,
impact and new
research collaborations
• Promote innovation and
allow research in your
field to advance faster
HOW TO MANAGE RESEARCH DATA?
Questions to consider
Image CC-BY-NC-SA by Leo Reynolds www.flickr.com/photos/lwr/13442910354
What file formats are most appropriate?
• Do you have a choice or do the instruments you use only export in certain
formats?
• What is common in your field? Try to use something that is accepted and
widespread
• Does your data centre recommend formats? If so it’s best to use these.
If you want your data to be re-used and sustainable in the long-term, you
typically want to opt for open, non-proprietary formats.
Type Recommended Avoid for data sharing
Tabular data CSV, TSV, SPSS portable Excel
Text Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images TIFF, JPEG2000, PNG GIF, JPG
Structured data XML, RDF RDBMS
www.data-archive.ac.uk/create-manage/format/formats-table
How will you name your files?
• Keep file and folder names short, but meaningful
• Agree a method for versioning
• Include dates in a set format e.g. YYYYMMDD
• Avoid using non-alphanumeric characters in file names
• Use hyphens or underscores not spaces e.g. day-sheet, day_sheet
• Order the elements in the most appropriate way to retrieve the record
www.jiscdigitalmedia.ac.uk/guide/choosing-a-file-name
Example from ARM Climate
Research Facility
www.arm.gov/data/docs/plan
Can others understand the data?
Think about what is needed in order to find, evaluate,
understand, and reuse the data.
• Have you documented what you did and how?
• Did you develop code to run analyses? If so, this should
be kept and shared too.
• Is it clear what each bit of your dataset means? Make
sure the units are labelled and abbreviations explained.
• Record metadata so others can find your work e.g.
title, date, creator(s), subject, format, rights…,
Where will you store the data?
• Your own device (laptop, flash drive, server etc.)
– And if you lose it? Or it breaks?
• Departmental drives or university servers
• “Cloud” storage
– Do they care as much about your data as you do?
The decision will be based on how sensitive your data are,
how robust you need the storage to be, and
who needs access to the data and when
Who will do the backup?
• Use managed services where possible (e.g. University
filestores rather than local or external hard drives), so
backup is done automatically
• 3… 2… 1… backup!
at least 3 copies of a file
on at least 2 different media
with at least 1 offsite
• Ask central IT team for advice
Which data need to be kept?
Five steps to follow
① Could this data be re-used
② Must it be kept as evidence or for legal reasons
③ Should it be kept for its potential value
④ Consider costs – do benefits outweigh cost?
⑤ Evaluate criteria to decide what to keep
5 steps to decide what data to keep
www.dcc.ac.uk/resources/how-guides/five-steps-decide-what-data-keep
Can you publish / share your data?
• Who owns the data?
• Have you got consent for sharing?
• Do any licences you’ve signed permit sharing?
• Is the data in suitable formats?
• Is there enough documentation?
Data Management Plans
It’s useful to consider these issues and the approaches you will
take in a Data Management Plan. Many research funders and
universities now ask for these.
• What types of data will the project generate/collect?
• What standards will be used?
• How will this data be shared/made available?
• If not, why? e.g. ethics & IP issues, embargoes, confidentiality
• How will this data be curated and preserved?
www.dcc.ac.uk/resources/data-management-plans/checklist
SUPPORT FOR IMPLEMENTATION
Good practice, tools, infrastructure & services
Image under licence by DCC
Managing and sharing data: a best
practice guide
http://data-archive.ac.uk/media/2894/managingsharing.pdf
Support on Data Management Plans
• Checklist on what to include
• How to guide on developing a plan
• Guidance on assessing plans (forthcoming)
• Webinars and training materials
• DMPonline tool
• Example DMPs
www.dcc.ac.uk/resources/data-management-plans
DMPonline
• Presents requirements from funders
• Guidance from funder, uni, discipline…
• Example answers
• Ability to share plans with collaborators
• Export into a variety of formats
• …
https://dmponline.dcc.ac.uk
Metadata standards catalogue
• Good metadata is key for research data access and re-use
• Many disciplines have formalised community metadata standards
• Use relevant standards for interoperability
www.dcc.ac.uk/resources/metadata-standards
www.dcc.ac.uk/resources/how-guides/license-research-data
Data licensing guidance and tools
Check out the EUDAT licensing wizard:
http://ufal.github.io/lindat-license-selector
Data repositories
http://databib.org
http://service.re3data.org/search
• Does your publisher or funder suggest a repository?
• Are there data centres or community databases for your discipline?
• Does your university offer support for long-term preservation?
HOW ARE OTHER UNIS SUPPORTING RDM?
Examples from the UK
Image CC-BY by GotCredit www.flickr.com/photos/jakerust/16844834832
Components of RDM services
www.dcc.ac.uk/resources/developing-rdm-services
Institutional RDM policies
www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies
Early research data policies
“Statement of commitment”
 Infrastructure  policy
“10 commandments”
mutual promises
aspirational
Baseline of RCUK Code
+ procedures & support
legal tone / language
a section in uni DM policy
useful guide as appendix
Based on Edin.
with a few
additions
RDM strategies and roadmaps
A series of blog posts
www.dcc.ac.uk/news
Links to example roadmaps
http://tiny.cc/EPSRCroadmaps
University of Bath RDM roadmap
• Based on Monash University RDM strategy
• Identifies the current position and proposes activity
• Defines roles and responsibilities and timeframes
www.bath.ac.uk/rdso/University-of-Bath-Roadmap-for-EPSRC.pdf
Guidance webpages
www.gla.ac.uk/datamanagement
www.bath.ac.uk/research/data
Online training
http://datalib.edina.ac.uk/mantra
Training for librarians
• Data Intelligence for librarians, 3TU, Netherlands
http://dataintelligence.3tu.nl/en/about-the-course
• DIY Training Kit for Librarians, University of Edinburgh
http://datalib.edina.ac.uk/mantra/libtraining.html
• RDM for librarians, DCC
http://www.dcc.ac.uk/training/rdm-librarians
• RDMRose, University of Sheffield
http://rdmrose.group.shef.ac.uk
• SupportDM modules, University of East London
http://www.uel.ac.uk/trad/outputs/resources
Data Management Planning support
• Guidelines / templates on what to include in plans
• Example answers, guidance and links to local support
• A library of successful DMPs to reuse
• Tailored consultancy services
• Online tools (e.g. customised DMPonline)
• Links / flags embedded in grant systems
• ...
Research data storage
Blue Peta at Bristol
1st 5TB free per Data Steward then
£400 per TB p.a. for disk storage;
tape backup £40 per TB
http://data.bris.ac.uk
• Use of HPC facility
• £2m funding to date
• 3 machine rooms – resilience
(tape archive 2012)
• Available to all researchers
for research data
Data storage at Hertfordshire
• Acquired 100TB of tier 2 storage which will be backed
up to tape for device level recovery
• Default offer will be 50GB but established an RDM
triage system and will accommodate < 5TB on the
basis of need
• 10TB of Arkivum A-stor for 10 years for archival
storage bolted onto Dspace institutional repository in
order to support long term preservation of datasets
www.herts.ac.uk/rdm
Institutional data repositories
Not intended to
replace national,
subject or other
established data
repositories
Acknowledge
hybrid environment
http://datashare.is.ed.ac.uk
www.dspace.cam.ac.uk
https://databank.ora.ox.ac.uk
Research Data at Essex &
DataPool at Southampton
Data catalogues
• Use of Current Research Information Systems (CRIS) e.g.
PURE and Symplectic
• Use of repository platforms e.g. ePrints, Dspace, CKAN
• No definitive standard – use of DataCite, CERIF, DDI…
• UK national Research Data Discovery Service pilot:
http://rdds.jiscinvolve.org/wp
Thanks – any questions?
Follow us on Twitter:
@fosterscience and #fosteropenscience
FOSTER training events and materials:
www.fosteropenscience.eu/events

More Related Content

What's hot

Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
What is data engineering?
What is data engineering?What is data engineering?
What is data engineering?yongdam kim
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management PlansSarah Jones
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
DAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and GovernanceDAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance StrategyAnalytics8
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksDatabricks
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata StrategiesDATAVERSITY
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Introduction to SQL Server Security
Introduction to SQL Server SecurityIntroduction to SQL Server Security
Introduction to SQL Server SecurityJason Strate
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Edureka!
 
Forget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart DataForget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart DataAlan McSweeney
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance SuccessAmple Insight Inc
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 

What's hot (20)

Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Data Lifecycle Management
Data Lifecycle ManagementData Lifecycle Management
Data Lifecycle Management
 
What is data engineering?
What is data engineering?What is data engineering?
What is data engineering?
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management Plans
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
DAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and GovernanceDAS Slides: Master Data Management – Aligning Data, Process, and Governance
DAS Slides: Master Data Management – Aligning Data, Process, and Governance
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with Databricks
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata Strategies
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Introduction to SQL Server Security
Introduction to SQL Server SecurityIntroduction to SQL Server Security
Introduction to SQL Server Security
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
 
Forget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart DataForget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart Data
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance Success
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 

Viewers also liked

Research data challenge presentation
Research data challenge presentationResearch data challenge presentation
Research data challenge presentationJisc
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinarSarah Jones
 
H2020 Open Research Data pilot
H2020 Open Research Data pilotH2020 Open Research Data pilot
H2020 Open Research Data pilotSarah Jones
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | EUDAT
 
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014Jisc
 
Research data policy
Research data policyResearch data policy
Research data policySarah Jones
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE
 
Entstehung der Forschungsdaten-Policy an der Humboldt-Universität zu Berlin
Entstehung der Forschungsdaten-Policy an der Humboldt-Universität zu BerlinEntstehung der Forschungsdaten-Policy an der Humboldt-Universität zu Berlin
Entstehung der Forschungsdaten-Policy an der Humboldt-Universität zu BerlinElena Simukovic
 

Viewers also liked (9)

Cni2012
Cni2012Cni2012
Cni2012
 
Research data challenge presentation
Research data challenge presentationResearch data challenge presentation
Research data challenge presentation
 
RDM LIASA webinar
RDM LIASA webinarRDM LIASA webinar
RDM LIASA webinar
 
H2020 Open Research Data pilot
H2020 Open Research Data pilotH2020 Open Research Data pilot
H2020 Open Research Data pilot
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
 
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
 
Research data policy
Research data policyResearch data policy
Research data policy
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
 
Entstehung der Forschungsdaten-Policy an der Humboldt-Universität zu Berlin
Entstehung der Forschungsdaten-Policy an der Humboldt-Universität zu BerlinEntstehung der Forschungsdaten-Policy an der Humboldt-Universität zu Berlin
Entstehung der Forschungsdaten-Policy an der Humboldt-Universität zu Berlin
 

Similar to Managing and sharing data

Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020Sarah Jones
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing dataSarah Jones
 
20160523 23 Research Data Things
20160523 23 Research Data Things20160523 23 Research Data Things
20160523 23 Research Data ThingsKatina Toufexis
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementSarah Jones
 
OU Library Research Support webinar: Data sharing
OU Library Research Support webinar: Data sharingOU Library Research Support webinar: Data sharing
OU Library Research Support webinar: Data sharingDaniel Crane
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE
 
DMP health sciences
DMP health sciencesDMP health sciences
DMP health sciencesSarah Jones
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data ManagementIzzyChad
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management Wendy Mears
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environmentphilipdurbin
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management IzzyChad
 
RDM for Librarians
RDM for LibrariansRDM for Librarians
RDM for LibrariansMarieke Guy
 

Similar to Managing and sharing data (20)

Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 
20160523 23 Research Data Things
20160523 23 Research Data Things20160523 23 Research Data Things
20160523 23 Research Data Things
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
DC101 UWE
DC101 UWEDC101 UWE
DC101 UWE
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
OU Library Research Support webinar: Data sharing
OU Library Research Support webinar: Data sharingOU Library Research Support webinar: Data sharing
OU Library Research Support webinar: Data sharing
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
Research Data Management and your PhD
Research Data Management and your PhDResearch Data Management and your PhD
Research Data Management and your PhD
 
Research-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhDResearch-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhD
 
DMP health sciences
DMP health sciencesDMP health sciences
DMP health sciences
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Getting to grips with Research Data Management
Getting to grips with Research Data ManagementGetting to grips with Research Data Management
Getting to grips with Research Data Management
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management
 
RDM for Librarians
RDM for LibrariansRDM for Librarians
RDM for Librarians
 

More from Sarah Jones

Data training tips and tricks
Data training tips and tricksData training tips and tricks
Data training tips and tricksSarah Jones
 
EOSC and libraries
EOSC and librariesEOSC and libraries
EOSC and librariesSarah Jones
 
EOSC Association priorities and activities
EOSC Association priorities and activitiesEOSC Association priorities and activities
EOSC Association priorities and activitiesSarah Jones
 
Managing and sharing data: lessons from the European context
Managing and sharing data: lessons from the European contextManaging and sharing data: lessons from the European context
Managing and sharing data: lessons from the European contextSarah Jones
 
Reflections on Open Science
Reflections on Open ScienceReflections on Open Science
Reflections on Open ScienceSarah Jones
 
MAR comments analysis
MAR comments analysisMAR comments analysis
MAR comments analysisSarah Jones
 
Introduction to Open Science and EOSC
Introduction to Open Science and EOSCIntroduction to Open Science and EOSC
Introduction to Open Science and EOSCSarah Jones
 
EOSC-MAR-update.pptx
EOSC-MAR-update.pptxEOSC-MAR-update.pptx
EOSC-MAR-update.pptxSarah Jones
 
Why is EOSC so hard?
Why is EOSC so hard?Why is EOSC so hard?
Why is EOSC so hard?Sarah Jones
 
The future of FAIR
The future of FAIRThe future of FAIR
The future of FAIRSarah Jones
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchersSarah Jones
 
Is Europe ready for Open Science
Is Europe ready for Open ScienceIs Europe ready for Open Science
Is Europe ready for Open ScienceSarah Jones
 
DMPonline: 10 years, 10 lessons
DMPonline: 10 years, 10 lessonsDMPonline: 10 years, 10 lessons
DMPonline: 10 years, 10 lessonsSarah Jones
 
Do & don't of supporting Open Science
Do & don't of supporting Open ScienceDo & don't of supporting Open Science
Do & don't of supporting Open ScienceSarah Jones
 
Why institutions need to raise their capabilities to support FAIR
Why institutions need to raise their capabilities to support FAIRWhy institutions need to raise their capabilities to support FAIR
Why institutions need to raise their capabilities to support FAIRSarah Jones
 
It takes more than a village: lessons on building global research commons
It takes more than a village: lessons on building global research commonsIt takes more than a village: lessons on building global research commons
It takes more than a village: lessons on building global research commonsSarah Jones
 
DMPTuuli - what's new?
DMPTuuli - what's new?DMPTuuli - what's new?
DMPTuuli - what's new?Sarah Jones
 
DCC and FAIR initiatives
DCC and FAIR initiativesDCC and FAIR initiatives
DCC and FAIR initiativesSarah Jones
 
Reflections on EOSC through the mirror of ARDC
Reflections on EOSC through the mirror of ARDCReflections on EOSC through the mirror of ARDC
Reflections on EOSC through the mirror of ARDCSarah Jones
 

More from Sarah Jones (20)

Data training tips and tricks
Data training tips and tricksData training tips and tricks
Data training tips and tricks
 
EOSC and libraries
EOSC and librariesEOSC and libraries
EOSC and libraries
 
EOSC Association priorities and activities
EOSC Association priorities and activitiesEOSC Association priorities and activities
EOSC Association priorities and activities
 
Managing and sharing data: lessons from the European context
Managing and sharing data: lessons from the European contextManaging and sharing data: lessons from the European context
Managing and sharing data: lessons from the European context
 
Reflections on Open Science
Reflections on Open ScienceReflections on Open Science
Reflections on Open Science
 
MAR comments analysis
MAR comments analysisMAR comments analysis
MAR comments analysis
 
Introduction to Open Science and EOSC
Introduction to Open Science and EOSCIntroduction to Open Science and EOSC
Introduction to Open Science and EOSC
 
EOSC-MAR-update.pptx
EOSC-MAR-update.pptxEOSC-MAR-update.pptx
EOSC-MAR-update.pptx
 
Intro-EOSC.pptx
Intro-EOSC.pptxIntro-EOSC.pptx
Intro-EOSC.pptx
 
Why is EOSC so hard?
Why is EOSC so hard?Why is EOSC so hard?
Why is EOSC so hard?
 
The future of FAIR
The future of FAIRThe future of FAIR
The future of FAIR
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchers
 
Is Europe ready for Open Science
Is Europe ready for Open ScienceIs Europe ready for Open Science
Is Europe ready for Open Science
 
DMPonline: 10 years, 10 lessons
DMPonline: 10 years, 10 lessonsDMPonline: 10 years, 10 lessons
DMPonline: 10 years, 10 lessons
 
Do & don't of supporting Open Science
Do & don't of supporting Open ScienceDo & don't of supporting Open Science
Do & don't of supporting Open Science
 
Why institutions need to raise their capabilities to support FAIR
Why institutions need to raise their capabilities to support FAIRWhy institutions need to raise their capabilities to support FAIR
Why institutions need to raise their capabilities to support FAIR
 
It takes more than a village: lessons on building global research commons
It takes more than a village: lessons on building global research commonsIt takes more than a village: lessons on building global research commons
It takes more than a village: lessons on building global research commons
 
DMPTuuli - what's new?
DMPTuuli - what's new?DMPTuuli - what's new?
DMPTuuli - what's new?
 
DCC and FAIR initiatives
DCC and FAIR initiativesDCC and FAIR initiatives
DCC and FAIR initiatives
 
Reflections on EOSC through the mirror of ARDC
Reflections on EOSC through the mirror of ARDCReflections on EOSC through the mirror of ARDC
Reflections on EOSC through the mirror of ARDC
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Managing and sharing data

  • 1. Managing and Sharing Research Data Sarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk Twitter: @sjDCC #fosteropenscience Open Science Days 2015, 21st & 23rd April, Prague & Brno http://foster.czu.cz/?r=6661
  • 2. RDM, OPEN SCIENCE, OPEN DATA... Some definitions and clarifications Image CC-BY-NC-SA by Tom Magllery www.flickr.com/photos/lwr/13442910354
  • 3. What is Research Data Management? The active management of data throughout the lifecycle Create Document Use Store Share Preserve • Data Management Planning • Creating data • Documenting data • Accessing / using data • Storage and backup • Selecting what to keep • Sharing data • Data licensing and citation • Preserving data • … CC-BY-NC-SACC-BY-NC-SA
  • 4. What is open science? “science carried out and communicated in a manner which allows others to contribute, collaborate and add to the research effort, with all kinds of data, results and protocols made freely available at different stages of the research process.” Research Information Network, Open Science case studies www.rin.ac.uk/our-work/data-management-and-curation/ open-science-case-studies
  • 5. Open data  make your stuff available on the Web (whatever format) under an open licence  make it available as structured data (e.g. Excel instead of a scan of a table)  use non-proprietary formats (e.g. CSV instead of Excel)  use URIs to denote things, so that people can point at your stuff  link your data to other data to provide context Tim Berners-Lee’s proposal for five star open data - http://5stardata.info “Open data and content can be freely used, modified and shared by anyone for any purpose” http://opendefinition.org
  • 6. Data sharing: degrees of openness Open Restricted Closed Content that can be freely used, modified and shared by anyone for any purpose Limits on who can use the data, how or for what purpose - Charges for use - Data sharing agreements - Restrictive licences - Peer-to-peer exchange - … Five star open data  Unable to share Under embargo
  • 7. WHY MANAGE AND SHARE DATA? Benefits and drivers Image CC-BY-NC-SA by wonderwebby www.flickr.com/photos/wonderwebby/2723279491
  • 8. Why YOU need a Data Management Plan http://blogs.ch.cam.ac.uk/pmr/2 011/08/01/why-you-need-a-data- management-plan What if this was your laptop?
  • 9. Data sharing boosts citation A study that analysed the citation counts of 10,555 papers on gene expression studies that created microarray data, showed: “studies that made data available in a public repository received 9% more citations than similar studies for which the data was not made available” Data reuse and the open data citation advantage, Piwowar, H. & Vision, T. https://peerj.com/articles/175
  • 10. Returns for institutions “If an institution spent A$10 million on data, what would be the return? The answer is: more publications; an increased citation count; more grants; greater profile; and more collaboration.” Dr Ross Wilkinson, ANDS www.ariadne.ac.uk/issue72/oar-2013-rpt
  • 11. Expectations of public access “Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.” RCUK Common Principles on Data Policy www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
  • 12. Funder imperatives... “The European Commission’s vision is that information already paid for by the public purse should not be paid for again each time it is accessed or used, and that it should benefit European companies and citizens to the full.” http://ec.europa.eu/research/participants/data/ ref/h2020/grants_manual/hi/oa_pilot/h2020-hi- oa-pilot-guide_en.pdf
  • 13. Increased use and economic benefit Up to 2008 • Sold through the US Geological Survey for US$600 per scene • Sales of 19,000 scenes per year • Annual revenue of $11.4 million Since 2009 • Freely available over the internet • Google Earth now uses the images • Transmission of 2,100,000 scenes per year. • Estimated to have created value for the environmental management industry of $935 million, with direct benefit of more than $100 million per year to the US economy • Has stimulated the development of applications from a large number of companies worldwide The case of NASA Landsat satellite imagery of the Earth’s surface: http://earthobservatory.nasa.gov/IOTD/view.php?id=83394&src=ve
  • 14. Validation of results www.guardian.co.uk/politics/2013/apr/18/uncovered-error-george-osborne-austerity “It was a mistake in a spreadsheet that could have been easily overlooked: a few rows left out of an equation to average the values in a column. The spreadsheet was used to draw the conclusion of an influential 2010 economics paper: that public debt of more than 90% of GDP slows down growth. This conclusion was later cited by the International Monetary Fund and the UK Treasury to justify programmes of austerity that have arguably led to riots, poverty and lost jobs.”
  • 15. More scientific breakthroughs www.nytimes.com/2010/08/13/health/research/13alzheimer.html?pagewanted=all&_r=0 “It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that we would never get biomarkers unless all of us parked our egos and intellectual property noses outside the door and agreed that all of our data would be public immediately.” Dr John Trojanowski, University of Pennsylvania
  • 16. Reasons to manage and share data Direct benefits for you • To make your research easier! • Stop yourself drowning in irrelevant stuff • Make sure you can understand and reuse your data again later • Advance your career – data is growing in significance Research integrity • To avoid accusations of fraud or bad science • Evidence findings and enable validation of research methods • Meet codes of practice on research conduct • Many research funders worldwide now require Data Management and Sharing Plans Potential to share data • So others can reuse and build on your data • To gain credit – several studies have shown higher citation rates when data are shared • For greater visibility, impact and new research collaborations • Promote innovation and allow research in your field to advance faster
  • 17. HOW TO MANAGE RESEARCH DATA? Questions to consider Image CC-BY-NC-SA by Leo Reynolds www.flickr.com/photos/lwr/13442910354
  • 18. What file formats are most appropriate? • Do you have a choice or do the instruments you use only export in certain formats? • What is common in your field? Try to use something that is accepted and widespread • Does your data centre recommend formats? If so it’s best to use these. If you want your data to be re-used and sustainable in the long-term, you typically want to opt for open, non-proprietary formats. Type Recommended Avoid for data sharing Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC Quicktime H264 Images TIFF, JPEG2000, PNG GIF, JPG Structured data XML, RDF RDBMS www.data-archive.ac.uk/create-manage/format/formats-table
  • 19. How will you name your files? • Keep file and folder names short, but meaningful • Agree a method for versioning • Include dates in a set format e.g. YYYYMMDD • Avoid using non-alphanumeric characters in file names • Use hyphens or underscores not spaces e.g. day-sheet, day_sheet • Order the elements in the most appropriate way to retrieve the record www.jiscdigitalmedia.ac.uk/guide/choosing-a-file-name Example from ARM Climate Research Facility www.arm.gov/data/docs/plan
  • 20. Can others understand the data? Think about what is needed in order to find, evaluate, understand, and reuse the data. • Have you documented what you did and how? • Did you develop code to run analyses? If so, this should be kept and shared too. • Is it clear what each bit of your dataset means? Make sure the units are labelled and abbreviations explained. • Record metadata so others can find your work e.g. title, date, creator(s), subject, format, rights…,
  • 21. Where will you store the data? • Your own device (laptop, flash drive, server etc.) – And if you lose it? Or it breaks? • Departmental drives or university servers • “Cloud” storage – Do they care as much about your data as you do? The decision will be based on how sensitive your data are, how robust you need the storage to be, and who needs access to the data and when
  • 22. Who will do the backup? • Use managed services where possible (e.g. University filestores rather than local or external hard drives), so backup is done automatically • 3… 2… 1… backup! at least 3 copies of a file on at least 2 different media with at least 1 offsite • Ask central IT team for advice
  • 23. Which data need to be kept? Five steps to follow ① Could this data be re-used ② Must it be kept as evidence or for legal reasons ③ Should it be kept for its potential value ④ Consider costs – do benefits outweigh cost? ⑤ Evaluate criteria to decide what to keep 5 steps to decide what data to keep www.dcc.ac.uk/resources/how-guides/five-steps-decide-what-data-keep
  • 24. Can you publish / share your data? • Who owns the data? • Have you got consent for sharing? • Do any licences you’ve signed permit sharing? • Is the data in suitable formats? • Is there enough documentation?
  • 25. Data Management Plans It’s useful to consider these issues and the approaches you will take in a Data Management Plan. Many research funders and universities now ask for these. • What types of data will the project generate/collect? • What standards will be used? • How will this data be shared/made available? • If not, why? e.g. ethics & IP issues, embargoes, confidentiality • How will this data be curated and preserved? www.dcc.ac.uk/resources/data-management-plans/checklist
  • 26. SUPPORT FOR IMPLEMENTATION Good practice, tools, infrastructure & services Image under licence by DCC
  • 27. Managing and sharing data: a best practice guide http://data-archive.ac.uk/media/2894/managingsharing.pdf
  • 28. Support on Data Management Plans • Checklist on what to include • How to guide on developing a plan • Guidance on assessing plans (forthcoming) • Webinars and training materials • DMPonline tool • Example DMPs www.dcc.ac.uk/resources/data-management-plans
  • 29. DMPonline • Presents requirements from funders • Guidance from funder, uni, discipline… • Example answers • Ability to share plans with collaborators • Export into a variety of formats • … https://dmponline.dcc.ac.uk
  • 30. Metadata standards catalogue • Good metadata is key for research data access and re-use • Many disciplines have formalised community metadata standards • Use relevant standards for interoperability www.dcc.ac.uk/resources/metadata-standards
  • 31. www.dcc.ac.uk/resources/how-guides/license-research-data Data licensing guidance and tools Check out the EUDAT licensing wizard: http://ufal.github.io/lindat-license-selector
  • 32. Data repositories http://databib.org http://service.re3data.org/search • Does your publisher or funder suggest a repository? • Are there data centres or community databases for your discipline? • Does your university offer support for long-term preservation?
  • 33. HOW ARE OTHER UNIS SUPPORTING RDM? Examples from the UK Image CC-BY by GotCredit www.flickr.com/photos/jakerust/16844834832
  • 34. Components of RDM services www.dcc.ac.uk/resources/developing-rdm-services
  • 36. Early research data policies “Statement of commitment”  Infrastructure  policy “10 commandments” mutual promises aspirational Baseline of RCUK Code + procedures & support legal tone / language a section in uni DM policy useful guide as appendix Based on Edin. with a few additions
  • 37. RDM strategies and roadmaps A series of blog posts www.dcc.ac.uk/news Links to example roadmaps http://tiny.cc/EPSRCroadmaps
  • 38. University of Bath RDM roadmap • Based on Monash University RDM strategy • Identifies the current position and proposes activity • Defines roles and responsibilities and timeframes www.bath.ac.uk/rdso/University-of-Bath-Roadmap-for-EPSRC.pdf
  • 41. Training for librarians • Data Intelligence for librarians, 3TU, Netherlands http://dataintelligence.3tu.nl/en/about-the-course • DIY Training Kit for Librarians, University of Edinburgh http://datalib.edina.ac.uk/mantra/libtraining.html • RDM for librarians, DCC http://www.dcc.ac.uk/training/rdm-librarians • RDMRose, University of Sheffield http://rdmrose.group.shef.ac.uk • SupportDM modules, University of East London http://www.uel.ac.uk/trad/outputs/resources
  • 42. Data Management Planning support • Guidelines / templates on what to include in plans • Example answers, guidance and links to local support • A library of successful DMPs to reuse • Tailored consultancy services • Online tools (e.g. customised DMPonline) • Links / flags embedded in grant systems • ...
  • 43. Research data storage Blue Peta at Bristol 1st 5TB free per Data Steward then £400 per TB p.a. for disk storage; tape backup £40 per TB http://data.bris.ac.uk • Use of HPC facility • £2m funding to date • 3 machine rooms – resilience (tape archive 2012) • Available to all researchers for research data
  • 44. Data storage at Hertfordshire • Acquired 100TB of tier 2 storage which will be backed up to tape for device level recovery • Default offer will be 50GB but established an RDM triage system and will accommodate < 5TB on the basis of need • 10TB of Arkivum A-stor for 10 years for archival storage bolted onto Dspace institutional repository in order to support long term preservation of datasets www.herts.ac.uk/rdm
  • 45. Institutional data repositories Not intended to replace national, subject or other established data repositories Acknowledge hybrid environment http://datashare.is.ed.ac.uk www.dspace.cam.ac.uk https://databank.ora.ox.ac.uk Research Data at Essex & DataPool at Southampton
  • 46. Data catalogues • Use of Current Research Information Systems (CRIS) e.g. PURE and Symplectic • Use of repository platforms e.g. ePrints, Dspace, CKAN • No definitive standard – use of DataCite, CERIF, DDI… • UK national Research Data Discovery Service pilot: http://rdds.jiscinvolve.org/wp
  • 47. Thanks – any questions? Follow us on Twitter: @fosterscience and #fosteropenscience FOSTER training events and materials: www.fosteropenscience.eu/events

Editor's Notes

  1. The Research Information Network ran a series of open science case studies a few years ago to look at openness across different disciplines. These included astronomy, bioinformatics, chemistry, clinical neuroimaging, epidemiology and language technology. Through this they came up with the definition. They defined open science as … The first aspect is about conducting science in an open way to encourage others to engage with it, and also sharing the outputs at various stages. I’ll unpack this definition over the next few slides
  2. Open science is also about making data available. Open data is data that can be used, modified and shared by anyone for any purpose. So for example it can be mined, exploited, reproduced, commercialised... Data under a non-commercial license wouldn’t be open as you’re restricting how it can be used and by whom. There are different levels of openness. Making content available on the web, in any format, under an open license is open data. It’s not necessarily that reusable though. By making it available as structured data, in non-proprietary formats, using URIs and linking to other data for context, makes it much more valuable.
  3. There’s also a citation advantage for individual researchers. This study by Heather Piwowar and Todd Vision looked at 10,555 paper of gene expression studies that had shared the associated microarray data. Those studies that shared data received 9% more citations.
  4. He was making a comparison with the Hubble telescope, which A$1.5 billion is spent on each year. The cost of the Hubble archive (A$1 million per annum) is just a fraction of this, but given the OA mandate, they’ve see the research publications produced by Hubble discoveries double.
  5. The background to this is about making the most of the data that has been created through publicly funded research. The guidelines speak of: Improved quality of results Greater efficiency Faster to market = faster growth Improved transparency of the scientific process
  6. There’s also an economic benefit, as seen by the case of the NASA landsat satellite images. These were sold until 2008 for $600 a scene. Now they’re freely available and used by Google Earth. Previously they sold 19,000 images a year, whereas now they transmit 2.1 million. The revenue has gone up incredibly too from $11.4 million to an estimated value of $935 million with direct benefit of more than $100 million. The release has also stimulated the development of applications from companies worldwide. This case study comes from the Royal Society Report on Science as an Open Enterprise.
  7. The validation of results is key. This article references an inadvertent error in a economics paper by Reinhadt and Rogoff. Missing some rows out of an average gave drastically different results – what was published suggested that countries with 90% debt ratios see their economies shrink by 0.1%. Instead, it should have found that they grow by 2.2% – less than those with lower debt ratios, but not a spiralling collapse. This mistake wasn’t picked up on initially as the data hadn’t been shared. The mistake fed into government policy - the findings of this paper were used as justification for austerity measures in the UK and various other countries in the EU.
  8. Certain research communities have also seen the benefit of sharing data as it speeds up the process of discovery. This article shows how researchers in the field of Alzheimer’s research have agreed as a community to share data immediately to make scientific breakthroughs.
  9. Learn good data habits now! You’ll need them later.
  10. Guidance from the DCC can also help researchers to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some CC options