Shaun will give an overview to the EUDAT service suite, explaining the key function and role of the different B2 services and how they interconnect. Examples will be given of how each service has been used by communities to explain the nature and scale of the service provision. We show how the B2 Service Suite can be linked to the data lifecycle and the role of each component in any data management planning. By the end of this talk, users should have a good overview of each of the B2 services and how they do, or will, fit together, and how they can be used as a part of a coherent data management plan.
Visit https://eudat.eu/eudat-summer-school
EUDAT Service Suite Overview - EUDAT Summer School (Shaun de Witt, CCFE)
1. www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
The EUDAT Service Suite
Shaun de Witt
2. Learning Objective
Why Data Management is Important
Get an overview of the EUDAT services and how they link together
4. The Importance of Data Management
Research Infrastructure trends:
Internationalisation
Diversification
middle age 19th century 20th century 21st century
Large Scale Projects:
SKA (300PB/yr)
LHC (600PB/yr – run 4)
Human Brain Project (21PB)
IPCC/CMIP (10PB ->150PB)
ITER (600PB/yr)
XFEL (10PB/yr)
5. Data Scientist Core Skills (Research Oriented)
From EDISON Data Science Framework
7. Data Management (9000-1,000BC)
Good
Excellent long term preservation with
the right materials
Cheapish materials
Difficult to corrupt/overwrite
Bad
No key to interpretation
VERY slow write rates
Difficult to re-use in a global
environment
Main Uses
Laws, accounting and taxes
8. Data Management (2500BC-present)
Good
Easy to manufacture materials
Much finer details than previous effort
(better bit density)
OK long term preservation
Improved data movement
Bad
Needs right conditions
Easily lost/fragile
Cataloguing and indexing was initially a
problem
Main Uses
Laws, accounting and taxes
Sketching, writing, love letters, paper
airplanes…
9. Data Management (1AD-present)
Good
Easy to manufacture materials
Simpler to organise
OK long term preservation
Highly portable
Bad
Needs right conditions
Easily lost/fragile
Optical Character Recognition not
Invented
Loss of knowledge over time
Main Uses
Meeting minutes, meeting actions,
doodling, Remembering the date,
10. Data Management (1890s – 1990s)
Good
Excellent longevity
Very high data density (35mm frame is
equivalent to about 40 Mpixels)
Highly portable
Reproducible
Bad
Sometimes takes several attempts to
produce good data
Fragile
Subject to noise
Not very metadata rich (difficult to
index)
Main Uses
Pretty astronomical pictures, Family
albums, Selfies…
12. Data Management (1950s – 1990s)
Good
Really programmable
Proper digital.
Reproducible and Transportable
Bad
Tapes and disk were
Low data density
Not designed to any standard
Main Uses
Maths, Bad 1970’s movie backdrops,
Calculating taxes140MB
170MB
1.25MB/s
~ 1MW
~1MB memory
~16 MIPS
14. Data Management (1960s - present)
Good
Powerful
Scalable
Bad
MS-DOS, Windows 3.1, OS-2
Main Uses
Chuckie Egg, Solitaire, Manic Miner
~ 15MW
~1.3PB memory
~93TFLOPS
~900TB
>500PB
~50MB/s
15. Data Management – A Personal Perspective
Data Management is NOT just about data preservation
What data do I need to preserve
What data do I want to make visible
What legal frameworks do I need to adhere to
Who can access my data
Where do I need to move my data to
And when do I need to move it
17. EUDAT Service Suite
During this course you will learn:
How services link to data lifecycle
How services support the FAIR principle
How to use the Web Interface
Where available
How to use the APIs available to access services programmatically.
20. EUDAT Data Domain modeled on the ANDS1 Data Curation Continuum
1. Australian National Data Service organization – www.ands.org.au
Data Domains
21. Help desk
Monitoring
Collaboration Tools (restricted access)
Service Catalogue and registry
Input to requirements
Data project
co-ordination
More than Just Services
22. Secure Access to Services
b2access.eudat.eu
www.eudat.eu
B2ACCESS
B2ACCESS is an easy-to-use and secure authentication and
authorization platform which can be integrated with any
service and supports different methods of authentication.
23. An easy-to-use and secure
authentication and authorization
platform integrated with any
services
The user may log in by using
different methods of
authentication:
Home organisation identity
provider
Social ID
EUDAT ID
Allows group-, community- and
service managers to specify
authorisation decisions
Features:
Easy integration in any service
Reliable and light-weight
Powerful management interface
24. b2drop.eudat.eu
www.eudat.eu
Sync and Share Research Data
B2DROP
EUDAT’s Personal Cloud Storage Service
B2DROP is a secure and trusted data exchange service for
researchers and scientists to keep their research data synchronized
and up-to-date and to exchange with others.
25. Store and exchange data with
colleagues and team members,
including research data not
finalized for publishing
share data with fine-grained
access controls
synchronize multiple versions of
data across different devices
An ideal solution for researchers and scientists to:
Features:
20 GB storage per user
Living objects, so no PIDs
Versioning and offline use
Desktop synchronisation
26. Store and Publish Research Data
b2share.eudat.eu
www.eudat.eu
B2SHARE
B2SHARE is a user-friendly, reliable and trustworthy way for
researchers, scientific communities and scientists to store and share
small-scale research data from diverse contexts.
27. store data safely at a trusted
and certified data centre
preserve data to guarantee
long-term persistence
control access and share data
with colleagues and the world
A winning solution for researchers, scientists and communities
to:
Features:
Metadata management
Permanent PIDs
Open Access support
28. Replicate Research Data Safely
eudat.eu/b2safe
www.eudat.eu
B2SAFE
B2SAFE is a robust, safe and highly available service which allows
community and departmental repositories to implement data
management policies on research data across multiple administrative
domains in a trustworthy manner.
29. replicate research data into secure
data stores
archive and preserve research data
in the long-term
bring data close to powerful
compute resources
co-locate data with different
communities
benefit from economies of scale
The ideal solution for communities with no facility for archival
to:
Features:
Large-scale storage
Robust and highly available
Permanent PIDs
30. Get Data to Computation
eudat.eu/b2stage
www.eudat.eu
B2STAGE
B2STAGE is a reliable, efficient, light-weight and easy-to-use service
to transfer research data sets between EUDAT storage resources and
high-performance computing (HPC) workspaces
31. move large amounts of data
between data stores and high-
performance compute resources
re-ingest computational results
back into EUDAT
deposit large data sets onto EUDAT
resources for long-term preservation
Facilitating communities to:
Features:
High-speed transfer
Reliable and light-weight
Manages permanent PIDs
33. seek data objects and collections
using powerful metadata searches
catalogue community data by
means of selected metadata
browse through multi-disciplinary
data collections filtered by content,
provenance and temporal keywords
A metadata catalogue service to:
Features:
Simple to use
Standards-based
Comprehensive catalogue
34. Data Discovery and Identification
b2handle.eudat.eu
www.eudat.eu
B2HANDLE
B2HANDLE provides an abstraction layer between a globally
unique persistent identifier and a physical location of a data
object allowing researchers to reliably cite and refer in the
long term.
35. Provides abstraction layer between
a globally unique persistent
identifier and physical location
of data objects
Follows policies to register data
and make it long term refer- and
citable
Features:
Reliability through mutual PID mirroring
Machine readable via HTTP RESTful API
Simple integration with any service
Technology agnostic
Sensors increasing resolution (synchrotron and muon sources), genetic sequencing becoming quicker and sequencers becoming cheaper – superexponetial growth)
EUDAT cooperates with a wide variety of research communities, such as the medical and biomedical sciences, environmental sciences, materials and analytical facilities, social sciences and humanities and physical sciences and engineering.
EUDAT has concrete agreements with 7 core communities, an integral part of the initiative, namely:
CLARIN: Common Language Resources and Technology Infrastructure
ELIXIR: A distributed infrastructure for life-science information
ENES: European Network for Earth System Modelling
EPOS: European Plate Observing System
ICOS: Integrated Carbon Observation System
LTER Europe: European Long Term Ecological Research Network
VPH: Virtual Physiological Human
EUDAT works on a Collaborative Data Infrastructure conceived as a network of collaborating, cooperating centers, combining the richness of numerous community specific data repositories with the permanence and persistence of some of Europe’s largest scientific data centers.
B2ACCESS is an easy-to-use and secure authentication and authorization platform which can be integrated with any service and supports different methods of authentication.
B2ACCESS provides an easy-to-use and secure authentication and authorization platform integrated in all other services. It provides different methods of authentication through the home organisation identity provider, but also allows social IDs like Google and Facebook as well as the EUDAT ID. Managers can specify authorisation decisions in the dedicated interface.
For more information see: b2access.eudat.eu
The EUDAT Services Suite consists of seven services. Each service is presented in the following slides, starting with B2DROP.
The B2DROP service can be characterized as a personal cloud storage service. It is a secure and trusted data exchange service.
The B2DROP service is a cloud solution to store and share data in the early state of the research data life cycle. It is aimed at individual researchers and enables the storage and exchange of data with colleagues and team members. Data can be shared with fine-grained access controls. B2DROP synchronizes multiple versions of data across different devices and platforms. B2DROP users are offered up to 20 GB of storage space for their data.
The B2DROP service can be found at b2drop.eudat.eu.
The next service of the EUDAT Services Suite is the B2SHARE service to store and share small-scale research data form diverse contexts.
The B2SHARE service is aimed at individual researchers. It has been integrated in a number of research infrastructures and EUDAT defines custom made community based metadata schema templates to facilitate users.
B2SHARE facilitates data storage in a trusted and certified repository that guarantees long-term persistence of the data. Data objects get a persistent identifier. Depositors can document their data objects and give the data a usage license, preferably an open access license.
The B2SHARE service can be found at b2share.eudat.eu.
The third service of the EUDAT Services Suite is the B2SAFE service.
This service allows community and department repositories to implement data management policies on research data across multiple administrative domains.
The B2SAFE service is aimed at research communities that have no facilities for archival data storage. The service supports a number of procedures such as data replication and the co-location of data with different communities. B2SAFE facilitates high-scale petabytes storage.
More information on the B2SAFE service can be found at eudat.eu/b2safe.
The B2STAGE service enables the movement of large amounts of data between data stores and high-performance computing resources.
The B2STAGE service is aimed at research communities and infrastructures to move large amounts of data between data stores and high-performance computing resources, to re-ingest computational results back into EUDAT and to deposit large data sets onto EUDAT resources for long-term preservation.
More information on the B2STAGE service can be found at: eudat.eu/b2stage.
The last service of the EUDAT Service Suite is the B2FIND service. It can be characterized as a simple, user-friendly metadata catalogue of research data collections stored in EUDAT data centers and other repositories.
The B2FIND service enables the searching and browsing for data objects and collections and supports a number of metadata formats. B2FIND facilitates browsing through multi-disciplinary data collections.
More information on the B2FIND service can be found at: https://b2find.eudat.eu.
B2HANDLE provides an abstraction layer between a globally unique persistent identifier and a physical location of a data object allowing researchers to reliably cite and refer in the long term.
B2HANDLE provides an abstraction layer between a globally unique persistent identifier and a physical location of data objects. It follows policies to register data and make it long term referable and citable.
The service provides high reliability and availability and can be easily integrated using a HTTP RESTful API in any other service or application. The service is therefore technology-agnostic.
For more information see: eudat.eu/b2handle
Most of the EUDAT services have their dedicated service domain and documentation. For all services documentation, tutorials and training are in the making or already exist.
The service suite will be enhanced and expanded. So keep in touch. If you have any questions or remarks concerning the services or the EUDAT initiative, please use the contact form available at the EUDAT website: eudat.eu.