The presentation provides a perspective on how distributed computing has been instrumental to make ground breaking scientific discoveries possible, and how the opening of computing infrastructures at international level has been effective in delivering unprecedented compute capacity and advance data analytics tools to international research collaborations.
The presentation provides examples of the enormous scientific impact produced by the international collaboration of cyber infrastructures in Europe, Africa and other continents, and will explain the federated organizational model adopted by European countries to leverage national ICT investments and mobilize them.
The presentation offers an overview of the present and future technical and organisational challenges of data-driven research in various scientific domains. The European Open Science Cloud initiative of the European Commission will be explained and opportunities of collaboration will be discussed with the audience.
Conference website: http://www.eresearch-africa.uct.ac.za/
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Distributed scientific computing for open science, eResearch Africa 2019
1. www.egi.eu
@EGI_eInfra
The work of the EGI Foundation
is partly funded by the European Commission
under H2020 Framework Programme
EGI: Advanced Computing for Research
eResearch Africa 2019
Distributed scientific
computing for open science
Technical Director, EGI Foundation
Tiziana Ferrari
5. @EGI_eInfrawww.egi.eu 17/04/2019 5
EGI Federation (April 2019)
• ZA-UCT-ICTS University of Cape Town - ICTS HPC site
• ZA-UFS (University of the Free State Computing
Centre)
HPC site of the University of Bloemfontein
• ZA-WITS-CORE (University of the Witwatersrand
CORE)
Core research cluster of the University of the
Witwatersrand
4.4 Billion
CPU core
wall time
(2018)
> 1 Million
computing
cores in 2019
> 740 PB disk
& tape
2,915 service
end-points
10. @EGI_eInfrawww.egi.eu 17/04/2019 10
• Leverages national e-Infrastructure investments
• Opens access to part of the nationally funded capacity
• Supports international user groups
• Integrates community, private and/or public infrastructures into a scalable
data/computing platform for research
• Uses federated identities, authentication and authorization
• Ensures interoperability of scientific applications and data across multiple
providers bringing distributed computing to data
Benefits of Federation
15. @EGI_eInfrawww.egi.eu 17/04/2019 15
Architecture and Interfaces
• Tools to deal with
heterogeneity:
IaaS orchestration tools
with support for multiple
APIs:
o Infrastructure Manager,
Terraform, OCCOPUS, …
o https://wiki.egi.eu/wiki/Fe
derated_Cloud_IaaS_Orch
estration
IaaS libraries with support
for multiple APIs:
o libcloud, jclouds,…
17. @EGI_eInfrawww.egi.eu 17/04/2019 17
• Identity and Access Management solution
Single sign-on to services through eduGAIN, social media and other institutional or
community-managed identity providers
Only one account needed for federated access to multiple heterogeneous (web and
non-web) service providers using different technologies (SAML, OpenID Connect,
OAuth 2.0, X509)
Identity linking enables access to resources using different login credentials
(institutional/social)
Assurance information associated to each authenticated identity
Aggregation and harmonisation of authorisation information (VOs/groups, roles,
assurance) from multiple sources
Federated AAI: Check-in
18. @EGI_eInfrawww.egi.eu 17/04/2019 18
• Conforms to AARC blueprint architecture
• Registered in eduGAIN as an SP complying
with REFEDS Research & Scholarship and
Sirtfi
• All community SPs can have one statically
configured IdP
• No need to run an IdP Discovery Service
on each SP
• Connected SPs get consistent/harmonised
user identifiers and accompanying
attribute sets from different IdPs/AAs that
can be interpreted in a uniform way for
authorisation purposes
Check-in: Identity Provider and Service
Provider Proxy
25. @EGI_eInfrawww.egi.eu 17/04/2019 25
• Heterogeneous
backend storage
• Common interfaces
(Web, REST, POSIX,
CDMI)
• Common AAI with
Check-in
• Discovery of Datasets
in the EGI DataHub
Federation of Data Repositories
26. @EGI_eInfrawww.egi.eu 17/04/2019 26
• Clients uses one
ore more providers
to access data
• Data can be
accessed over
multiple protocols
Transparent Data Access
27. @EGI_eInfrawww.egi.eu 17/04/2019 27
Data Caching
• Cloud provider A hosts data
& computing resources
• Provider B only hosts data
Provider X can use data from A
and B
• Without pre-staging
• Via pre-staging using APIs
• Local data access “à la”
POSIX with FUSE
35. @EGI_eInfrawww.egi.eu 17/04/2019 35
• The computational tools to solve a problem
Python, R, Julia, and wide ecosystem of libraries and tools for science
• An interface to facilitate coding / creating Jupyter
• A way to communicate work Notebooks
• A way to share work GitHub other similar repositories
• A way to pack it all for replication Docker
• A way to persistently identify it DOIs (Digital Object Identifiers)
Reproducible Open Science with
EGI Notebooks, Binder, Zenodo
https://documents.egi.eu/document/3442
43. @EGI_eInfrawww.egi.eu 17/04/2019 43
Today’s scenario
• Difficult cross-border access due to different funding models, access and
provisioning policies
Data and service provisioning to international user communities possible only when supported by
sound business models or existing collaboration agreements. Today only a few structured int.
research groups have achieved this.
• Needs of large investments for the creation, processing, preservation, access
and reuse of research data will the funding match the anticipated needs of
future data-intensive science?
Opportunities for economies of scale and aggregation of demand can arise with joint provisioning
of infrastructure common components
• Major separation between data preservation and data exploitation
infrastructures in many disciplines
Ris and e-Infrastructures should collaborate to support the entire research workflow of an
experiment
44. @EGI_eInfrawww.egi.eu 17/04/2019 44
Tomorrow’s scenario
The International Data Commons
A federation of research data, computing, applications
and other open science resources, responding to the
problem of scalable access to research data through a
new data provisioning service approach that is
complementary to the traditional data download
model.
45. @EGI_eInfrawww.egi.eu 17/04/2019 45
The Data Commons Should…
Allow to discover, access and analyze major research
datasets and information for third-party exploitation
Provide access to the data & data products close to
processing facilities while avoiding duplication of
local data storage & compute infrastructures across
research performing organizations in Europe
46. @EGI_eInfrawww.egi.eu 17/04/2019 46
The Data Commons Should…
• Offer a hybrid distributed compute platform (HTC, HPC,
cloud) and integrated rich portfolio of scientific application
tools supporting self-service provisioning
• Offer tools for scalable data movement across data
preservation infrastructures and distributed interconnected
network of “data hubs”
• Provide integrated capabilities for publishing and sharing
scientific outputs from experiments to support open science
• Support federated authentication and authorization for use
of existing personal credentials and easy to use access
channels
47. @EGI_eInfrawww.egi.eu 17/04/2019 47
The federated infrastructure and supporting initiative
providing
all researchers, innovators, companies and citizens
with seamless access to an open-by-default, efficient and
cross-disciplinary environment
for storing, accessing, reusing data, tools, publications and
other scientific outputs for research, innovation and
educational purposes
About the European Open Science Cloud
48. This work by the EGI Foundation
is licensed under a Creative Commons
Attribution 4.0 International License.
Questions?
Thank you
for your attention.
www.egi.eu
@EGI_eInfra
EGI: Advanced Computing for Research
The work of the EGI Foundation
is partly funded by the European Commission
under H2020 Framework Programme