Jisc Research Data Shared Service Open Repositories 2018 Paper
1. Jisc Research Data Shared Service
Looking at the past, looking to the future
2. Needs
»A better name…
“What's in a name?
That which we call a rose
By any other word would smell
as sweet;”
Prize available! @johnpkaye http://pngimg.com/download/651/?i=1
4. Who we are
07/06/2018 Building a National Data Service
Jisc is the UK higher, further education
and skills sectors’ not-for-profit organisation
for digital services and solutions
Operate shared digital
infrastructure and
services
Provide trusted advice and
practical assistance for
universities, colleges and
learning providers
We…
Negotiate sector-wide deals
and conditions with IT vendors
and commercial publishers
4
5. Jisc Digital Futures
07/06/2018 Building a National Data Service
Store
services
Playlists Diagnostic
tool builder
Curation and
remix
Learner
Analytics Services
Digital
capability
Learning
analytics
Digital
launchpad
Apprentice
workforce
development
Digital
leadership
Summer of
student
innovation
Analytics
academy
Analytics
labs
Qualification
verification
App
and
content
store
Research data
discovery
Research
data
usage
metrics
Equipment
data
Repository and
preservation platform
Research
data
shared
service
?
5
6. Open science
» Open Science: an umbrella term for a
technology and data driven systemic
change in how researchers work,
collaborate, share ideas, disseminate
and re-use results, by adopting the
core values that knowledge should be
reusable, modifiable and
redistributable.This allows us to
address the increasing demand
in society to address societal
challenges of our time.
https://vimeo.com/161464468
PV 2018 , open science research data
Open as possible & as closed as necessary
7. Why open science, research data sharing ?
»Innovation and impact
»Meet global challenges with
new forms of research
»Supports research integrity
and better research
»Accountability
PV 2018 , open science research data
https://royalsociety.org/topics-policy/projects/science-public-
enterprise/report/
http://doi.org/10.1371/journal.pone.0026828
8. G7 Science Ministers, Open Science,2017
“We recognize that ICT developments, the digitisation and the vast availability of data, efforts
to push the science frontiers, and the need to address complex economic and societal
challenges, are transforming the way in which science is performed towards Open Science
paradigms.We agree that an international approach can help the speed and coherence of this
transition, and that it should target in particular two aspects. First, the incentives for the
openness of the research ecosystem: the evaluation of research careers should better
recognize and reward Open Science activities. Secondly, the infrastructures for an optimal
use of research data: all researchers should be able to deposit, access and analyse
scientific data across disciplines and at the global scale, and research data should adhere
to the FAIR principles of being findable, accessible, interoperable, and reusable.”
September 2017
http://www.g8.utoronto.ca/science/2017-science-communique.html
PV 2018 , open science research data
9. UK Policy
PV 2018 , open science research data
» Awareness of regulatory environment
» Data access statement
» Policies and processes
» Data storage
» Structured metadata descriptions
» DOIs for data
» Data securely preserved for a minimum of 10 years
from last use
» University roadmaps in place 2012, mandate in place
from 1 May 2015
10. Solution
Shared research data service to
meet the requirements for
universities to enable better
management of research data
web: https://www.jisc.ac.uk/rd/projects/research-data-shared-service
Github: https://github.com/JiscRDSS
Blog: https://researchdata.jiscinvolve.org/wp/
PV 2018 , open science research data
11. Research Data Preservation Challenge
Implementing
Archivematica
for research data
preservation at
York and Hull
Jenny Mitcham
(Digital
Archivist) -
University of
York
26 April 2018 Jisc ResearchSupport 11
13. Key researcher issues driving investment in
Preservation
Source: Jisc DAF Survey results 2016
Capture & reuse Preserve Report
Advise &
best practise
Filling a gap
75% of respondents
look first to their
institution to
preserve their data
Uptake of RDM
Only 40% of
respondents have a
Research Data
Management plan
Advocacy
Only 16% of
respondents are
currently accessing
university RDM
support services
Metadata
Only 18% of
respondents say
they follow
established
metadata guidelines
Public datasets
>70% recognise that
research is a public
good and should be
publicly released
Sensitive data
41% of respondents
have some form of
sensitive data
PV 2018 , open science research data
14. preservation
PV 2018 , open science research data
“Support is woeful in the university currently, in particular
long-term data archiving is critically required. Most of my
non-current data is rotting on CD's and hard-drives.”
15. Preservation Challenges
»Automated preservation workflow
› Lack of resources
› Preservation sausage machine
»Interactive preservation workflow
› Work with the researcher and their data
»Appropriate Workflow
› What is an appropriate workflow?
16. This is it!
Preservation
Systems
Multi-tenant
administration
Discovery User Interfaces and Portals
APIs
User
InterfacesUser
InterfacesUser
InterfacesTenant User
Interfaces
APIs
Jisc Reporting
APIs
APIs
APIs
Tenant
Storage
APIs
Jisc Repository Core Infrastructure
APIs
Metadata Store
Publish
Subscribe
Messaging
Service
Cloud Data
Storage (Access
and Archival)
Tenant Repository, CRIS
and research systems
Scholarly Communications,
Service APIs
17. Service workflow summary
17
Repository
Messaging
Preservation
service
Reporting
and
analytics
Archival
data
storage
National research data aggregation
Or
1.a. Researcher
deposits data
2. Data added to
aggregation
3. Data is automatically
preserved
4. Use of data and service
is monitored
7. Data stored long term
6. Researchers find and
reuse data
Institutional or external
services
5. Other services are
updated
07/06/2018 Building a National Data Service
1.b. Record of data
external deposit
Layer
19. Demo
Pre-recorded short demo
showing data being uploaded
into a test Samvera repository
instance, with automatic ingest
into Preservica.
26 April 2018 Jisc ResearchSupport 19
https://www.youtube.com/watch?v=d-l1ARNUwWA&t=13s
23. Production Service
Nick Youngson - CC BY-SA - http://www.thebluediamondgallery.com/handwriting/i/insurance-requirements.html
»Scalable
»Sustainable
»Intuitive
24. Production Priorities
»First priority is research data
› Research output (Article/Thesis etc.)
› Research data
› Research software/code
› Provenance metadata (method)
»But also…..
› Preservation systems tailored for
multiple digital objects and data types
› Use cases and pilots for objects
beyond research data
26 April 2018 Jisc ResearchSupport 24
https://creativecommons.org/licenses/by/2.0/
https://www.flickr.com/photos/cogdog/
25. Production: Where we are now?
Preservation
Systems
Multi-tenant
administration
Discovery User Interfaces and Portals
APIs
User
InterfacesUser
InterfacesUser
InterfacesTenant User
Interfaces
Jisc Reporting
APIs
APIs
Tenant
Storage
Jisc Repository Core Infrastructure
APIs
Metadata Store
Publish
Subscribe
Messaging
Service
Cloud Data
Storage (Access
and Archival)
Tenant Repository, CRIS
and research systems
Scholarly Communications,
Service APIs
APIs
APIs
APIs
27. Part of the Jisc family
Submission Acceptance Publication Use
SHERPA
JULIET
SHERPA
RoMEO
SHERPA
REF
SHERPA
Fact
Monitor
UK
Jisc
collections OpenDOAR
Publications
Router
Monitor
local
CORE
IRUS-UK
RIOXX
Research
publication
lifecycle
Jisc
services
Report on
compliance
Deposit in
repository
Manage
costs
Check
compliance
Select
Journal
Maximise
impact
Record
reach
Record
impact
ORCID
support
OpenAIRE
NOAD
Research Data
Shared Service
Metrics lab
experiment
28. 3 standard service options
End-to-end
service
Repository
service
Preservation
service
Service to be launched in
Autumn 2018
All 3 options include:
Financial benefits
Standards
Advisory
Network membership
33. Preservation action registry (PAR)
PV 2018 , open science research data
Arkivum, Artefactual, Preservica, Open
Preservation Foundation & Jisc
» Sharing information on digital formats,
actions (e.g. conversion) and tools (e.g.
JHOVE) for preservation via APIs
» Exchange between systems and organisations
on preservation policies and solutions
» Capture best practice and give confidence for
preservation actions, accelerate adoption of
digital preservation
» Prototype based on Archivematica’s Format
Policy Register and Preservica’s Linked Data
Registry as part of the Jisc research data
shared service
34. Jisc national shared research platform
Information sources
» Publications Router
» Publishers
» Crossref
» ORCID
» DataCite
» PubMed
» Sherpa policy tools
University systems
» (SingleSign-On,
Finance,HR..)
Information
destinations
» Google etc.
» Discovery services
» Jisc CORE
(global OA aggregation)
» Jisc Monitor
(compliance checking)
» Jisc Collections
» Funders systems
» OpenAIRE + for EU
Preservation
services
Reports and
dashboards
University X repository
Open Access publications
Research datasets
University Y repository
Open Access publications
Research datasets
University Z repository
Open Access publications
Research datasets
36. jisc.ac.uk
John Kaye
Head of Change – Research
john.kaye@jisc.ac.uk
@johnpkaye
PV 2018 , open science research data
Notes de l'éditeur
A definition of open science and its scope – open science covers the whole of the research lifecycle – from the creation, to collaboration and analysis and being as open as possible and as closed as necessary, for example there is sensitive data and rights to first use etc.
One of the key ways to describe open science is via the FAIR principles, findable, accessible, interoperable and re-useable. So it is not just about having access to a paper or data it is being able to re-use it for multiple purposes.
The Spanish Cucumber E. Coli. This genome was analysed within weeks of its outbreak because of a global and open effort; data about the strain’s genome sequence were released freely over the internet as soon as they were produced. Often we see this in human related emergencies – so imagine if this was the case for more research and more widely applied ?
Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results
This is an international effort , it is global and G7 see it as important – global – focus on incentives and change towards open practice and infrastructure /FAIR
Also the vision here is very similar to that of the European Open Science Cloud – EOSC – Jisc is active in the EOSC and it is relevant to Jisc remaining an essential part of the global infrastructure for research
French president recently stated public funded research data should be open by default , this was alongside the AI and innovation agenda – so for AI to meet its potential open data is needed
Research organisations have primary responsibility for ensuring that researchers manage their data effectively. They need established infrastructure and processes to ensure:
Retained EPSRC-funded research data is preserved for a minimum of ten years
Effective data curation is provided throughout full data lifecycle
Knowledge of publicly-funded research data holdings
Discoverability; recording of third party access requests
Notice and justification of access restrictions, for example ‘commercially confidential’
Awareness and use of relevant law, for example FOI
Awareness and compliance with research data policies
Adequate RDM resource allocation for example from quality-related research (QR) funding or research grants
More effective Research Data Management must happen to comply with Funder Mandates, ensure data is not lost, and to realise a whole range of positive benefits
A shared service (provided by Jisc) seems to offer a number of benefits:
Cost savings and efficiencies
Common approaches and practice – do this together
Research system standardisation and interoperability ( do it once rather than many times! , & also address it across essential systems so we can key once and share)
Address market gaps
The long tail
The long tail of unidentifiable files that we will have to deal with
Mention Jenny Mitcham's stats - around 60% of unidentifiable items in the RDM collection using existing workflows
PDF's - easy to deal with, as problem solved by global initiatives e.g. JHOVE, VeraPDF
We worked with a lot of people!
How and why we’ve got to where we are
Pilots
Worked with 16 pilot institutions of various sizes
Created a procurement framework
Pilots
17 institutions
Cross spectrum use cases
Large and small
Collaborative development
Multiple suppliers and solution vendors
Drivers
More than £5 million investment over 2 years
Open access
Sector defined requirements
“R@R” co-design
Over half the HEI sector involved
Automated preservation workflow:
lack resources (staff, skills, budget, time) to do a comprehensive job of preserving all their research datasets and will instead
want a low-cost, fully automated, 'black box' approach to digital preservation of at least some of their data.
They want a 'preservation sausage machine' whereby research data is fed in at one end and out of the other comes 'preservation packages' containing the research data in a form that is better described and structured for long-term usability.
Interactive preservation workflow:
Institution will want to work closely with both the Research Data and the Researcher as part of an iterative process of quality control and digital
This is a more interactive and resource intensive process than 'automated' preservation, but can yield better results and may be more appropriate for specific types of research or institution.
The most appropriate workflow to use will depend on many factors, e.g. the experience an institution has with digital preservation, the resources at its disposal, the research discipline or type of data involved, the requirements of the research funder, the institutions policy and so on.
How and why we’ve got to where we are
Text and
Where are we now
Multi tenant user interface
Workflows
Obligatory slide that cant be read by anyone
It wouldn’t be an RDSS presentation without it
See Dom if you want to read it
data model, apis and reference (example) apis and data available to develop
First priority is research data
However research data is not good enough on it’s own for research publication the ideal is to store, or link to the complete research package
Research output (Article/Thesis etc.)
Research data
Research software/code
Provenance metadata (method)
Without this research is often not re-usable or repeatable.
But…..we are aware that we are providing preservation systems that are tailored to deal with all kinds of digital objects and data.
We are exploring use cases and pilots for objects beyond research data.
» Scalable multi-tenant repository and preservation infrastructure for multiple content types
» Intuitive user experience - automated workflows
» Financially sustainable
» Multiple storage options
» Event based open APIs
» Reporting APIs and dashboards
First priority is research data
However research data is not good enough on it’s own for research publication the ideal is to store, or link to the complete research package
Research output (Article/Thesis etc.)
Research data
Research software/code
Provenance metadata (method)
Without this research is often not re-usable or repeatable.
But…..we are aware that we are providing preservation systems that are tailored to deal with all kinds of digital objects and data.
We are exploring use cases and pilots for objects beyond research data.
Where we are now
Core Architecture
multitenant database
Interoperability layer
Data model
Proof of concept front end API
Initial Front end design
Where are we now
Multi tenant user interface
Workflows
Integrations with open access service
Core Architecture consisting of multitenant database with tech, descriptive and event metadata.
Proof of concept API that shows that we can put a lightweight front end on multi tenant core architecture
Begun design of front end development with our Design and User Experience spcialists
Integrations with open access service
* This year, the University of Westminster will be piloting RDSS within their existing information infrastructure.* Since 2013, the University has been using Haplo Research Manager as their Virtual Research Environment.* As well as CRIS, postgraduate research workflows and research ethics functions, it provides the University's repository, integrating REF and Open Access management.* It has excellent buy-in from researchers, as a "one stop shop" for research information with an compelling user experience.* This is their current setup, with a custom hybrid Haplo/EPrints repository.* Information flows from University systems into Haplo.* Researchers self-deposit outputs into the Haplo Repository module, and a workflow helps them work with the metadata team to prepare their outputs for publication.* When published, outputs are pushed to EPrints which provides the public view of the repository.* The researcher's full research profile, including publications, is pushed to the corporate web site.
* By August 2018, Westminster will be using an "all Haplo" Repository.* EPrints will be retired,* and Haplo will provide the public interface of Westminster repository.* The project is lead by the Research and Scholarly Communications team based in Library and Archives at UoW in collaboration with University of Westminster academics* The move is motivated by a desire to use a standard repository, using the Haplo core product with a small amount of local configuration on top.* Haplo Repository provides many "building blocks", and Westminster have selected the ones most appropriate for their research.* In particular, collections workflow, support for practice-based research, and REF & OA reporting* Westminster are working with two other institutions to define common workflows and reporting, aiming to launch their new repositories at roughly the same time.
* By the end of the year, Westminster will be using RDSS services for preservation.* Haplo already has initial support for the RDSS message bus.* The Haplo developers like the RDSS because a single interface allows integration with many different research data components: "we get many integrations for the price of one"* The flexibility of the RDSS allows Westminster to pick the components they need, and easily integrate into their research information infrastructure.
Actions = check sums , convert files – for normalization or rendering etc. TIFF to JPEG
These act on objects, create AIP
Actions are executed by tools = DROID, JHOVE
general purpose or disciplinary specific