Description and scope of the Project
Phidias HPC is aimed at developing a consolidated and shared HPC and Data service by building on pre-existing and emerging infrastructure in order to create a federation of "user to infrastructure" services.
To achieve its purpose and to gain a comprehensive picture of the European infrastructure landscape, three data area tests will develop and provide new services to discover, manage and process spatial and environmental data produced by research communities tackling scientific challenges such as atmospheric, marine and earth observation issues.
Webinar: How to improve the cloud services for marine data
Observing the ocean is challenging: missions at sea are costly, different scales of processes interact, and the conditions are constantly changing, which is why scientists say that "a measurement not made today is lost forever". For these reasons, it is fundamental to properly store both the data and metadata, so that their access can be guaranteed for the widest community, in line with the FAIR principles: Findable, Accessible, Inter-operable and Reusable.
PHIDIAS HPC has organised a webinar entitled "PHIDIAS: Boosting the use of cloud services for marine management, services and processing" to be held on 4th June 2020 at 11 AM CEST. The webinar aims to introduce the Phidias HPC initiative, in collaboration with the Blue-Cloud project, to the European HPC and Research community, specifically in the Blue economy, to improve the use of (1) cloud services for marine data management, (2) data services to the user in a FAIR perspective, and (3) data processing on demand.
These objectives will be pursued in coherence with the development of the European Open Science Cloud (EOSC) and the Copernicus Data and Information Access Services (DIAS).
PHIDIAS - Boosting the use of cloud services for marine data management, services and processing
1. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
PHIDIAS: Boosting the use of cloud
services for marine data management,
services and processing
Webinar | June 4, 2020, 11:00 AM CEST
2. PHIDIAS Ocean Use Case
204.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
3. Webinar Agenda
11:00 - 11:05 - Introduction of PHIDIAS project - Francesco Osimanti, Trust-IT
Services, PHIDIAS WP7 Leader
11:05 - 11:15 - PHIDIAS Ocean use case and contribution of HPC to marine
studies - Cecile Nys, IFREMER
11:15 - 11:25 - Exploring advanced cloud services for marine and oceanographic
data access and data management - Gilbert Maudire, IFREMER
11:25 - 11:30 - Q&A Session
11:30 - 11:40 - Passport photos for plankton: new era for marine biology research -
Jukka Seppälä, SYKE
11:40 - 11:50 - Analyzing ocean observations in an HPC infrastructure with
DIVAnd - Alexander Barth, University of Liege
11:50 - 12:00 - Blue-Cloud Platform: marine-thematic EOSC services for Marine
Research and the Blue Economy - Pasquale Pagano, CNR-ISTI & Blue-Cloud Project
12:00 - 12:05 - Q&A Session
12:05 - 12:10 - Closing remarks
04.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 3
5. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
PHIDIAS Ocean use case and
contribution of HPC to marine studies
Cécile NYS, IFREMER
Assistant Manager Ocean Data Cluster – ODATIS
Phidias WP6 member
Webinar | June 4, 2020
6. WP6 “Use-case 3 – Ocean” overview
Combine, collocate and process
data from several data sources (in
situ & satellite)
Enhancing data archiving (most
observation cannot be
reproduced) facilitate data
reuse
Facilitate and speed up co-
localisation and process of data
from different sources
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 6
7. WP6 “Use-case 3 – Ocean” overview
Combine and collocate data from several data sources (in situ &
satellite)
Adopting new data structures (based on big-data technologies)
DataCubes
NoSQL databases (numerical data) : Cassandra, MongoDB, etc.
Semantic Web (text data)
Providing on demand data browsing and processing facilities
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 7
8. Surface Salinity in North Atlantic
CTD (SeaDataNet),
Argo Floats (CMEMS),
SMOS satellite.
Chlorophyll in North-East Atlantic and Baltic Sea
CTD and bottles (SeaDataNet),
BGC Argo floats (ARGO GDAC),
Ferrybox,
Sentinel 2 images (DIAS WEkEO).
Case-studies
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 8
9. 904.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
Data Infrastructure Harmonisation
Collections
Data lake
Processing
Data Infrastructure Harmonisation
Data Infrastructure Harmonisation
Data flow
10. 1004.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
Data Infrastructure Harmonisation
Collections
Data lake
Processing
Peter THIJSSE (presented by Gilbert MAUDIRE)
Exploring advanced cloud services for marine and
oceanographic data access and data management
11. 1104.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
Data Infrastructure Harmonisation
Collections
Data lake
Processing
Jukka SEPPÄLÄ
Passport photos for plankton: new era
for marine biology research
12. 1204.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
Data Infrastructure Harmonisation
Collections
Data lake
Processing
Alexander BARTH
Analyzing ocean observations in an HPC
infrastructure with DIVAnd
14. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
Cloud services for marine and
oceanographic data access and data
management
Gilbert Maudire (Ifremer) / Peter Thijsse (MARIS)
June 4, 2020, 11:25 AM CEST
15. Outline
Introduction
Data resources in scope
Discovery service
Prototype Data Lake for processing
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 15
16. Main objective recap
to improve the use of cloud services for marine data
management, data service to users in a FAIR perspective, data
processing on demand, taking into account the European Open
Science Cloud (EOSC) challenge and the Copernicus Data and
Information Access Services (DIAS).
1604.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
17. Marine data resources in scope
1704.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
SeaDataNet in-situ
Euro-ARGO in-situ CMEMS in-situ
SMOS and Sentinel-3
Remote sensing
18. Discovery service
Build up metadata indexes of available datasets
Metadata checks during import (completeness/readable/correct
vocabularies)
Include DOI’s/PID’s of the original datasets
New DOI’s will be assigned for newly processed datasets (SEANOE)
Use elastic search to support fast response on searches
1804.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
19. Metadata is important
The PHIDIAS catalogue metadata model will be based on Dublin Core
element (extended with ISO19115 if necessary):
compliant with the Dublin Core standards. If relevant, for example for geo-referenced
data, metadata are made compatible with ISO 19115 standard (e.g. by the addition of
geographical extend…). Main managed information are:
General metadata (Dublin Core)
Title | Author(s) and affiliations (link with ORC ID) | Publication date | Abstract | References | Use Conditions (Possible limitations…) |
Reference to data user’s manual (if any)
Access conditions
Data License (Creative Commons license, ...) | Provided data citation in DataCite format | Access service(s) | Data format and size
Keywords (CodeLists provided):
Variables (link with the Essential Ocean Variables Code List) | Method(s) | Instrument(s) | Project(s)
Geographical extends
Min and Max latitudes and longitudes | Location map
Temporal extends
Data preview(s)
List of citing publication
…
1904.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
20. Prototype Data Lake for processing
Two data types:
In-situ datasets:
not extremely large, but in many small files.
managed data types are heterogeneous: vertical profiles, times series, underway
data...
Satellite datasets:
may be very large (> several tens of petabytes at total), that leads to difficulties to
transfer them over networks.
The “Data Lake” will be periodically synchronized (e.g. daily)
2004.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
21. Different use cases, different storage (1)
For in-situ datasets - Online selection and vizualization of data using a two-
step discovery service via a common catalogue:
1) Selection of “Data collections” / Datasets , and then
2) selection of the subset of data of interest.
Example: Exploring SeaDataNet (Common Data Index) and Copernicus Marine Services
data collections including fast detection of co-localized data
Access to data will have to be optimized to select and retrieve a small
amount of data among a large number of metadata records, using
different selection criterions : geographical, temporal...
Prototype: Elastic Search on top of (No)SQL database, in order to allow
faceting of the web selection portal, with optimized response time.
2104.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
22. Different use cases, different storage (2)
Facilitate and improved access to data (especially for in-situ data)
for fast and interoperable access for visualization and subsetting
purposes (web portal) : “access few data among many data”.
Output: Small” extracted data subsets and web-based maps and
diagrams (representation of time-series and of vertical profiles).
Prototype: set up of the Data Lake by implementing NoSQL Data
base (e.g. Cassandra). This includes the synchronization
procedures from distributed data sources to the adopted data
structure within the Data Lake.
2204.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
23. Different use cases, different storage (3)
Support on- demand data processing of large data subsets using
DIVA or Pangeo
Requires high performance browsing and processing of large
amount of data (e.g. salinity and chlorophyll), preferably in
parrallel: “access many data among many data”.
Output : Gridded fields of Salinity and Chlorophyll.
Data lake prototype: “Data Cubes” which are used to access data
using Pangeo software components suite : e.g. zarr format,
Xarray, Parquet, Arrow.
2304.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
24. Thank-you
Gilbert Maudire (Ifremer), PHIDIAS WP6 Leader
Peter Thijsse (peter@maris.nl) and the PHIDIAS WP6 group
13.02.2020 PHIDIAS Webinar | 13.02.2020 | www.phidias-hpc.eu | @PhidiasHpc 24
25. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
PHIDIAS: Boosting the use of cloud
services for marine data management,
services and processing
Passport photos for plankton:
new era for marine biology research
Jukka Seppälä, Seppo Kaitala,
Kaisa Kraft, Otso Velhonoja SYKE
Webinar | June 4, 2020, 11:00 AM CEST
26. Phytoplankton abundance is typically estimated
using ocean colour, in situ sensors or lab analysis
Phytoplankton contribute 50% of the global photosynthesis: CO2 fixation and O2 production.
Due to measurement uncertainties and undersampling, the role of oceans – and phytoplankton
– is one of the key unknowns in global carbon-budget
We may observe the abundance of phytoplankton using Chlorophyll a as a proxy
26
Long-term average concentration of chlorophyll at the
ocean’s surface in milligrams per cubic meter of water.
The data in this map were provided by the Joint Research
Centre (JRC). Source EMODnet.
Seasonal concentration of chlorophyll in the Baltic Sea,
between Helsinki (FI) and Travemünde (DE), measured
with the ferrybox. Source Alg@line project, SYKE.
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
27. Species/group –specific information is crucial to
understand the biogeochemical fluxes
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 27
Bulk biomass estimates by Chlorophyll a do not reflect the diversity of phytoplankton
Phytoplankton community composition is largely affected by environmental and anthropogenic
forcing (light, nutrients, temperature)
Phytoplankton community composition responds very quickly to chaotic rhytms of aquatic
environments
Phytoplankton community composition (and functional types) largely affects the aquatic
elemental fluxes (carbon and nutrients) and structure of the food web (up to fish)
Photos of phytoplankton, taken by Imaging FlowCytobot at Utö station, Gulf of Finland
28. Why plankton imaging
Trad. microscopy is slow and costly (though
accurate and important reference method!)
New technologies based on optics, fluidics and
imaging offer rapid, automated, unattended,
quantitative, and cost-efficient analysis of individual
cells and colonies of plankton organisms
Possibility to permanently store the digital raw data
gathered, which allows re-analyses, and creation of
open data archives within the international scientific
community
28
Cyanobacterial bloom in the Baltic 2018 - with 3
main species recorded at 20 min intervals.
Kraft et al in prep.
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
29. Plankton imaging – state of art
13.02.2020 29
Various technologies available, many in the beta-
version/demonstration phase. Some forerunner
technologies (e.g. Cytosense) have well established
user communities and common vocabularies for
metadata.
Machine learning algorithms available but
optimising/developments ongoing
Central data storage not available, no agreed way to
connect to data aggregators
EcoTaxa web application an European forerunner for
visual exploration and the taxonomic annotation of
images. Initiated by Laboratoire d'Océanographie de
Villefranche (LOV) https://ecotaxa.obs-vlfr.fr/
PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
30. Imaging technology
13.02.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 30
IMAGING FLOWCYTOBOT at SYKE
Images of phytoplankton cells (range 10-150µm)
Operate remotely on Utö island flow through system
Samples of 5ml with approx. 20 min interval
Camera triggered by chlorophyll-a fluorescence
Up to 30 000 high resolution images / hour
Random Forest algorithm for image regocnition –
moving towards Convolutional Neural Networks
31. Plankton imaging – PHIDIAS
31
Demonstration: from
image to information
Imaging FlowCytobot (Finnish
Environment Institute, Utö)
Finnish Meteorological
Institute's server
CSC (Center for Scientific
Computing, FI) Allas object
storage
- Data storage and sharing
during the project's
duration
Data aggregators / other
users
- EcoTaxa
- Long time data storage
cPouta (Cloud computing)
- Development of CNN-models
- GPU flavor is needed
Puhti (high performance
computing)
- CNN in production mode
(classification of new images)
- GPU or CPU flavor
- Potential realtime usage
Images
Labels
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
32. 3204/06/2020
PHIDIAS, at the focal point for multiplatform detection
of phytoplankton:
EO algorithms – sensor validation – ML, CNN – DIVA
PictureLauriLaaksoFMI
PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
33. Thank-you, stay tuned,
and see you again!
Jukka Seppälä, SYKE
jukka.seppala@ymparisto.fi
13.02.2020 PHIDIAS Webinar | 13.02.2020 | www.phidias-hpc.eu | @PhidiasHpc 33
Special Thanks to SYKE,
FMI, LUT and CSC staff
supporting the various steps
of plankton imaging!!!
34. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
Analyzing ocean observations in a
HPC infrastructure with DIVAnd
Alexander Barth, Charles Troupin University of
Liège
35. ● Many ocean
processes are
present
simultaneously
● Non-linear
Interaction between
them
● Wide time/space
spectrum of scales
● → High diversity of
ocean observations
The ocean is complex...
2
Image creation: Center for Environmental Visualization, University of Washington
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
36. … and is complex to observe
The types of observations
observations is quite diverse
Ocean observations are
sparse (because expensive)
Yet scientifically very valuable
(a measurement not taken it
lost forever, the state of the
climate and ocean in
particular changes)Image credits: ICTS SOCIB
3604.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
37. Challenges in ocean data analysis
37
Fast access to data, multitude of formats, general trend towards
netCDF
Different programming environments/languages used by
scientists:
•Fortran (still used in numerical models)
•Matlab (very widespread ~10 years ago, but less use today)
•Python
•R
But also Julia, C, C++, shell scripts,...
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
38. Switching to Julia language
● At GHER, ULiège: started to use Julia to use in 2017
● Julia version 1.0 was released on 8 August 2018
3804.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
39. DIVAnd
● DIVA: Data Interpolating Variational
Analysis
● Objective: derive a gridded
climatology from in situ
observations
● The variational inverse methods aim
to derive a continuous field which is:
○ close to the observations (it should not
necessarily pass through all
observations because observations
have errors)
○ "smooth"
● Spline interpolation
3904.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
40. ● Workshops
● Virtual Research Environment
(VRE) in SeaDataCloud
● Jupyter Notebooks
● CI (Continuous Integration)
testing (Linux, Mac OS,
Windows)
● Docker and Singularity
images with preconfigured
software
DIVAnd
4004.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
41. DIVAnd in a virtual research environment
https://vre.seadatanet.org/
4104.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
42. BlueCloud VRE
BlueCloud VRE will
also include DIVAnd
4204.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
43. Computing resource
● DIVAnd needs to solve a large matrix system
● The solvers:
○ direct solver (SuiteSparse, Cholmod) requiring a significant amount of
memory but a very fast
○ iterative solvers (preconditioned conjugate gradient) are more memory
efficient but slower
● In practice: the direct solver is preferred as long as the problems fits
into the available memory
● But having access to computing resources with sufficient
memory has been a problem for our users (SeaDataCloud, EMODnet
Chemistry)
● Code portability via Singularity container
4304.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
44. DINCAE
44
● Paper: Data INterpolating Convolutional Auto-Encoder
● Neural network to reconstruct missing data in satellite images
(in particular clouds in remotely sensed Sea Surface Temperature)
● Originally written in Python using TensorFlow 1
● Many changes in TensorFlow 2 -> better alternatives?
● Use Julia and with the Knet library
● Training time of the network was reduced from 3.5 hours to 1.9
hours (on a NVidia 1080 GPU)
● We use “data augmentation” (in particular perturbing input
data, add additional clouds,...) using vectorized numpy code,
but it could be made significantly faster by using Julia instead.
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
45. ● Sea Surface
Temperature (SST)
reconstruction with
DINCAE
● Some data is
withheld during the
reconstruction (i.e.
additional clouds)
● SST is reconstructed
and a reliable the
expected error
standard deviation
is computed
Some results with DINCAE
DINCAE reconstruction using MODIS sea surface temperature in theAdriatic
4504.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
46. Conclusions
46
● The types of available ocean data is quite diverse
● Fortran is still widely used in the oceanographic HPC community
○ But there are significant challenges to support users outside of a typical
HPC environment
○ Julia has been a good fit for us for data analysis
● The original Fortran tool DIVA has been rewritten in Julia
(DIVAnd)
● Jupyter notebooks provide the users a convenient interface
that can also be used in a Virtual Research Environment
(especially for data exploration)
● In future: adapt existing tools or adopt new algorithm able to
leverage GPUs (or other accelerators)
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
49. The mission
Blue-Cloud aims to pilot a
cyber platform
bringing together and
providing access to
49Boosting the use of cloud services for marine data management, services and processing4 June 2020
1.
multidisciplinary
data from
observations and
models
2. analytical
tools
3. computing
facilities
to support research
to better understand
and manage the many
aspects of
ocean sustainability
50. The Leading Concepts
Developing and deploying a cloud platform with a
Virtual Research Environment (VRE) with an array of
services for configuring Virtual Labs for specific
analytical workflows, use cases and demonstrators
Applying common standards and interoperability
solutions for providing harmonized data and metadata
Developing and deploying harmonised discovery and
access to a series of established European marine
data management and processing infrastructures, that
are dealing with major marine and ocean data
collections, related data centres, and their data
providers
Discovery and access
to datasets from many
sources
Upstream
Services
Downstream
Services
Added-value services
and applications
VRE – Cloud Platform
Standards
OGC, ISO, W3C
& Vocabularies
Boosting the use of cloud services for marine data management, services and processing4 June 2020 50
51. The Technical Framework
a component to serve federated discovery and access
• Bridging blue data infrastructures and their multi-disciplinary data
from observations, in-situ and remote sensing, data products and
outputs of numerical models
a component to serve as Blue Cloud Virtual Research
Environment (VRE)
• Federating computing platforms and analytical services; this will
include Virtual Labs for each of the use case Demonstrators
Boosting the use of cloud services for marine data management, services and processing4 June 2020 51
52. Blue-Cloud federation of major infrastructures
Blue Data infrastructures E-infrastructures
Boosting the use of cloud services for marine data management, services and processing4 June 2020 52
53. Boosting the use of cloud services for marine data management, services and processing4 June 2020 53
Blue-Cloud Virtual Research Environment
Exploits Blue-Cloud data discovery and
access service
Federates computing platforms and
algorithms
Interacts with external systems
Exposes all repositories, algorithms, and
computing platforms as a common unified
space of resources
Serves diverse communities of
researchers
54. Boosting the use of cloud services for marine data management, services and processing4 June 2020 54
Support collaborative research and experimentation
Implement Reproducibility-Repeatability-Reusability of Science
Allow sharing of data, processes and findings
Grant open access to the produced scientific knowledge
Tackle Big Data challenges
Manage heterogeneous data/processes access policies
Sustainability: low operational costs, low maintenance prices
Blue-Cloud Framework satisfies
Open Science Requirements
55. Boosting the use of cloud services for marine data management, services and processing4 June 2020 55
Tuning, testing and promoting with five
demonstrators
Zoo- and Phytoplankton EOV products
Plankton Genomics
Marine Environmental Indicators
Fish, a matter of scales
Aquaculture Monitor
Biodiversity
Environment
Fishery
Aquaculture
Genomics
56. Boosting the use of cloud services for marine data management, services and processing4 June 2020 56
Function of Demonstrators
Demonstrate how the Services developed contribute to unlocking
innovation potential
• to derive requirements and specifications for the Pilot Blue Cloud platform development
• to demonstrate the potential of cloud-based open science in the marine community
• to serve as a catalyst for wider community engagement, identifying longer term challenges,
and planning future developments from pilot to a full-scale Blue-Cloud infrastructure.
Identify the scientific communities requirements
• Storage (repositories, warehouses, …)
• Multidisciplinary data access and harmonisation
• Analytical processes
• Computing requirements
57. Boosting the use of cloud services for marine data management, services and processing4 June 2020 57
Piloting an EOSC ”thematic cloud”
58. Boosting the use of cloud services for marine data management, services and processing4 June 2020 58
Blue-Cloud project
• Funding: H2020: The ‘Future of Seas and Oceans Flagship
Initiative’ (BG-07-2019-2020) topic: [A] 2019 - Blue Cloud
services
• Timing: 36 Months (start October 2019)
• Budget: 5.9 Million Euro
• Partnership: 20 partners
59. Boosting the use of cloud services for marine data management, services and processing4 June 2020 59
Any questions?
https://blue-cloud.org
- Marine data from different sources,
- Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective (introduction to your Presentation Peter),
- Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective (introduction to your Presentation Peter),
- Some of datasets are quite large or includes numerous observations (such as plankton images),in addition, having different data collections stored in different locations such as satellite (Ocean color), plankton, ...imposes to improve data access for better processing performance (introduction to Jukka presentation)
- and then processing data requires data analyses software and powerful IT infrastructures (HPC, HPDA) available to users
(introduction to your presentation Charles).
Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective
Some of datasets are quite large or includes numerous observations (such as plankton images),in addition, having different data collections stored in different locations such as satellite (Ocean color), plankton, ...imposes to improve data access for better processing performance
and then processing data requires data analyses software and powerful IT infrastructures (HPC, HPDA) available to users
Some of datasets are quite large or includes numerous observations (such as plankton images),in addition, having different data collections stored in different locations such as satellite (Ocean color), plankton, ...imposes to improve data access for better processing performance
Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective
and then processing data requires data analyses software and powerful IT infrastructures (HPC, HPDA) available to users
and then processing data requires data analyses software and powerful IT infrastructures (HPC, HPDA) available to users
Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective
Some of datasets are quite large or includes numerous observations (such as plankton images),in addition, having different data collections stored in different locations such as satellite (Ocean color), plankton, ...imposes to improve data access for better processing performance
We focus in this presentation on the data access and storage to support processing