In this video from the 2013 National HPCC Conference, Wolfgang Gentzsch presents: EUDAT and Big Data in Science.
Big data science emerges as a new paradigm for scientific discovery that reflects the increasing value of observational, experimental and computer-generated data in virtually all domains, from physics to the humanities and social sciences. Addressing this new paradigm, the EUDAT project is a European data initiative that brings together a unique consortium of 25 partners — including research communities, national data and high performance computing (HPC) centers, technology providers, and funding agencies — from 13 countries. EUDAT aims to build a sustainable cross-disciplinary and cross-national data infrastructure that provides a set of shared services for accessing and preserving research data. The design and deployment of these services is being coordinated by multi-disciplinary task forces comprising representatives from research communities and data centers.”
You can watch the presentation with audio at insideHPC: http://insidehpc.com/2013/03/27/video-eudat-and-big-data-in-science/
Driving Behavioral Change for Information Management through Data-Driven Gree...
Eudat and Big Data in Science
1. EUDAT
EUDAT and Big Data in Science
Wolfgang Gentzsch, Advisor, EUDAT
HPCC 2013 Newport RI, 26-28 March 2013
2. Data trends
Zettabytes
Exponential growth
Exabytes
Petabytes
• Where to store it?
Terabytes
Gigabytes
• How to find it?
Increasing complexity and variety
• How to make the most of it?
• How to ensure
interoperability?
2
3. The EUDAT Case
If there are hundreds of Research Infrastructures, how
many different data management systems can we sustain?
3
4. Collaborative Data Infrastructure
-A framework for the future? -
User functionalities, data capture
Data & transfer, virtual research
Users
Generators environments
Data Curation
Data discovery & navigation,
Trust
workflow generation, annotation,
Community Support Services interpretability
Persistent storage, identification,
authenticity, workflow execution,
Common Data Services mining
7. Five research communities on Board
• EPOS: European Plate Observatory System
• CLARIN: Common Language Resources and Technology Infrastructure
• ENES: Service for Climate Modelling in Europe
• LifeWatch: Biodiversity Data and Observatories
• VPH: The Virtual Physiological Human
• All share common challenges:
– Reference models and architectures
– Persistent data identifiers
– Metadata management
– Distributed data sources
– Data interoperability
7
15. Building Blocks of the CDI
EUDAT Portal
Integrated APIs and harmonized access to EUDAT facilities
Metadata Catalogue AAI
Aggregated EUDAT metadata domain. Network of trust
Data inventory among
authentication
and
Data Staging Safe Replication Simple Store authorization
actors
Dynamic replication Data curation and Researcher data
to HPC workspace access optimization store (simple
for processing upload, share and
access)
16. SAFE_REPLICATION@EUDAT
Allow communities to replicate
data to selected data centers
for storage and do this in a
robust, reliable and highly
available manner.
Improve data curation and
accessibility.
More info: eudat-safereplication@postit.csc.fi
16
17. DATA_STAGING@EUDAT
Allow the communities to
dynamically replicate a subset
of their data stored in EUDAT
to an HPC workspace in order
to be processed.
More info: eudat-datastaging@postit.csc.fi
17
18. METADATA@EUDAT
Create a joint metadata
domain for all data stored by
EUDAT data centers and a
catalogue which exposes the
data stored within EUDAT,
allowing data searches.
The EUDAT repository should
provide an inventory of
metadata from different
communities
More info: eudat-metadata@postit.csc.fi
18
19. SIMPLE_STORE@EUDAT
Create an easy to use service that
will help researchers mediated by
the participating communities to
upload and store data which is not
part of the officially handled data
sets of the community.
This service will address the long
tail of “small” data and the
researchers/citizen scientists
creating/manipulating them.
More info: eudat-simplestore@postit.csc.fi
19
20. Persistent_Identifyers@EUDAT
Deploy a robust, highly
available and effective PID
service that can be used within
the communities and by
EUDAT.
Keeping track of the “names”
of data sets deposited with
the CDI requires robust
mechanisms.
More info: eudat-persistentidentifiers@postit.csc.fi
20
21. AAI@EUDAT
Provide a solution for a working
AAI system in a federated
scenario.
Design the AA infrastructure to
be used during the EUDAT
project and beyond.
More info: eudat-AAI@postit.csc.fi
21
23. Work plan for the next months
• Moving the services to a production environment
• Capturing additional requirements
• Integrating new partners to EUDAT (in particular
research communities)
– Working groups, pilots, observers and associate partners
• Collaborating with other initiatives
– European e-Infrastructures: EGI, PRACE, DANTE, HELIX
NEBULA, SCIDIPS-ES, etc.
– Global initiatives: RDA, CODATA, etc
• Defining EUDAT’s path to sustainability
– Cost and funding models
– Governance
23
24. Welcome to the 2nd EUDAT Conference!
28-30 October 2013, Rome
•International event with
keynotes from Europe and
US
• A forum to discuss the
future of data infrastructures
• Project presentations and
poster sessions
• Training tutorials
24