The main focus of Science Demonstrator sessions is to provide feedback to the EOSC community on the first experience of science demonstrators in the practical use of the emerging EOSC ecosystem.
Each panel will consist of a representative of a Science Demonstrator that will provide an overview of their experiences in the use of emerging EOSC services.
These sessions will help members of the scientific communities understanding the current state of maturity of the EOSC ecosystem and what is obtainable in a field of scientific research. It is also valuable to prospective Service Providers who wish to discover what are the challenges and opportunities that user communities might have to deal with, as a result of the adoption of their services.
This session will focus on Social and Earth Sciences.
3. The Science Challenge
3www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
• Create a text mining tool to identify named entities in
archaeological reports (text files)
• The entities broadly answer to “What-Where-When”
questions according to the domain ontology CIDOC CRM
• Identified entities enrich the document metadata and
enable finding it and linking it to other related datasets
• The result is of paramount importance as most of the
documentation in archaeology and in cultural heritage is
textual
4. The Science Demonstrator
4www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
• For the Science Demonstrator, we planned to work on
texts in Italian, as similar work had been done for English
and Dutch (but not in a cloud framework)
• Success rate should reach at least 90%
• Domain vocabularies are paramount for the work, so a
revision of the available ones was also planned
• Linguistic services also needed; they were already available
• GATE (Stanford)
- Word Segmentation
- Part of Speech recognition and tagging (POS)
• OpenNER (CNR-ILC)
- Named Entities Recognition and resolution
5. Successes
5www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
• The work was completed and is working very well
• The required services were all made available and working
• The tool is now being tested on further sets of textual
documents
• It is planned to incorporate it in the ARIADNEplus cloud
services, making it multingual
• ARIADNEplus is an integrating activity on archaeological datasets
due to start at end 2018
6. Issues
6www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Apart from issues specific to the service (vocabulary
availability, stemmers, etc.) the main issues concerned the
following:
• Cumbersome AAI
• The user interface that needed to adapt to the one already
available to access the cloud. This includes choosing the
data to be processed and getting the results.
Both the above will be addressed in the ARIADNEplus project
7. Lessons Learned
7www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Apart from improving the efficiency of the service (which is
already in the plans) to achieve the necessary scalability, the
necessary changes concern:
• Making its use easier in the cloud
• As regards the EOSC ecosystem, it must be (created? and)
adapted to the researchers’ needs rather than to an a-
priori architecture. This requires a substantial effort, which
is perhaps underestimated, and the researchers’
participation in it.
8. Visual Media Service
R.Scopigno, F. Ponchio
8www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
9. The Science Challenge
9www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Digital representation with visual media is of paramount
importance for Cultural Heritage (CH) research
Goals of Visual Media Service:
• Easy publication on the web to share with peers and the
public, one-click approach, managing different types of
data (2D hi-res images, RTI, 3D models)
• Availability of customizable visualization tools
• Permanent repository space, search&retrieval features
• All code is open source
10. The Science Demonstrator
10www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Planned work and EOSC related services:
• Support for users’ authentication (D4Science, Google)
• Implementation of several extensions (data management,
visualization tools and configurability)
• Integration with VRE (D4Science)
• Evaluation of resources usage and possible bottlenecks
(after user study) possibly, endorse parallelization and
cloud technologies provided by EOSC in the near future
evolution of the Visual Media Service
18. Successes
18www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
We are near completion of our SD:
• Users authentication integrated in the VisMediaService
• Several updates and extensions introduced in the
VisMediaService
• User testing started on October:
• Large number of visual files uploaded in the system by users
• Early evaluation of user satisfaction is largely positive
• Users’ appreciation for the new features and for the integration
with the VRE (D4Science)
19. Issues
19www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
No major issue to report
Understanding how to include EOSC features/support was quite easy,
thanks to the good documentation and easy contact with the developers
(by the way, some of them staff of our same CNR Institute)
Good support of shepards
Not easy to understand the overall framework of the EOSC project for
newcomers
Remaining issues:
• Evaluate the performances of the system under pressure (large no. of
contemporary users, both at data upload & interactive visualization
time)
• The latter is planned at start-up of the new EC Infra “ARIADNE+”
(providing a very large community of CH users) – early 2019
20. Lessons Learned
20www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
What are the main things you would change in your science
demonstrator?
● Plan the user test and verification earlier in the time planning
(giving less emphasis to the implementation of new features)
● Also to have sufficient time to evaluate performances needs and
consequent EOSC solutions
What are the main things you would change in the EOSC
ecosystem?
• Provide a structured and complete introduction and training of
newcomers (e.g. SD proponents which are not EOSC partners), to
support a faster start-up and a better understanding of the
project resources and who-is-who. One week course (with written
material)?
• Define a reward program/policy for people providing open data
21. Frictioneless Data
Exchange -
Petr Knoth, Open
University
21www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
22. The Science Challenge
22www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● A single scientific repository is of limited value
● Real benefits come from ability to exchange data
within a network
● Current technology for exchanging data across
repositories (OAI-PMH) more than 15 years old
○ Scalability
○ Implementations inconsistency
○ Metadata synchronisation only
23. The Science Demonstrator
23www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● Assess how scientific resources can be
effectively, regularly and reliably exchanged
across systems using the ResourceSync protocol.
● Conduct a set of experiments/benchmarks
comparing OAI-PMH with ResourceSync along a
set of dimensions, scenarios and
implementation setups.
24. Successes
24www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● Conducted experiments: different repository platforms, over 1k
repository systems, variety of configurations and scenarios =>
quantitative bencmark and case for adoption of ResourceSync.
● New synchronisation approach using On Demand Resource Dumps
● Updated and more scalable ResourceSync implementation.
● Adoption of technology in practice: paying customers for CORE,
including Naver Academic, helped and others.
● Interviews and recommendations for TEXTCROWD and High Energy
Physics SDs
● Supported ARC (OpenAIRE) in their efforts of adopting
ResourceSync
25. Issues encountered
25www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● Time scalability issues of ResourceSync in scenarios requiring
the synchronisation of large numbers of small files
● Setting up a fair benchmarking environment
○ Drawing comparisons only for synchronisation tasks conducted on
exactly the same data
○ Planning to limit the effect of network latency on the results
○ Analysing variance in response time and average resource size of
repositories per repository platform (for example, EPrints, DSpace,
OJS, etc.).
● Finding a reliable baseline for recall benchmark.
● Knowledge and skills gap between between technologists
who implement repositories and those who manage it
26. Lessons Learned
26www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● Tested approach widely applicable and with clear benefits
over currently used technology.
● Recommend ResourceSync as a default data and
metadata exchange protocol for all repositories operating
within EOSC.
● Need a communication channel to work with WP6
Interoperability (and others) to communicate results and
decide next steps.
27. ENVRI Radiative
Forcing Integration
Ville Kasurinen
27www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
28. The Science Challenge
28www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
- Focus on dynamics of greenhouse gases, aerosols and
clouds and their role in radiative forcing:
- https://eoscpilot.eu/science-demos/envri-radiative-forcing-
integration
- Interoperability between observations and climate
modeling
- Co-operation between environmental research
infrastructures (ICOS – ARCTRIS)
- Model runs using ESM outputs as a input for dynamic
vegetation model (LPJ-GUESS)
29. The Science Demonstrator
29www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
- Testing interoperability between different data sources
- Using different forcing data sets in model runs
- Comparisons between simulated and flux tower data
- Using VM in EGI cloud
- EGI Docker (Ubuntu 16.04) 8 GB RAM, 8 Cores + 1 TB block storage
- Input data from datahub (OneData)
- 2 TB diskspace
30. Successes
30www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
- OneData volume hosted IS-ENES data sets (1-2 TB)
- Virtual platform for processing input data and running
dynamic vegetation model LPJ-GUESS
- Comparison of simulated carbon and water fluxes to in-
situ measurements (Fluxnet 2015)
31. Issues
31www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
- A stable volume storing input data sets?
- File processing limited by the net work performance
- A need to modify input data before model runs
- No standardized way to compare flux tower and model
outputs
- Metadata synchronization is lacking (ESM - Flux towers –
land surface models)
32. Lessons Learned
32www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
- Interoperability is challenge when using data from
different sources
- Interoperability between data sets needs significant
improvements.
- ICOS can try to solve these issues partly
- Land surface model requirements for input data are
difficult to fulfil
- Model dependent outputs vs model dependent requirements
33. EPOS/VERCE Earthquake
simulation Platform
André Gemünd - Fraunhofer SCAI
33www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
34. The Science Challenge
34www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Earthquake Simulation: Production of
synthetic seismograms for public and
custom Earth models and Earthquakes via
the execution of HPC simulation codes
(SPECFEM3D & Globe)
Raw data acquisition & Misfit: The model is
evaluated and further improved by
comparing the synthetic data with real
observations collected by institutional
archives, adopting Data-Intensive worfkows.
35. The Science Demonstrator
35www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Enhancing the services of the VERCE portal and integrate the EGI
FedCloud Infrastructure as the main data-intensive computational
service provider for Misfit Analysis Workflows.
These consists in three different phases:
- Assisted discovery & preprocessing of the observed data and
the correspondent synthetic results
- Data pre-staging from the FDSN network to an iRODS instance
with metadata and provenance
- Final comparison adopting different Misfit techniques.
AAI and delegation mechanisms are needed to submit executions
and to connect to remote data-stores (iRODS) from the Cloud
36. Successes
36www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Processing workflows enabling the Misfit analysis (data download,
preprocessing and misfit) have been refactored to support their
execution on EGI FedCloud resources.
The lineage services have been upgraded to a later version of the
S-ProvFlow system, improving the interactive exploration of
lineage information and delivering PROV format for interoperable
provenance analysis.
The portal has been extended to allow the retrieval of Per-User
Sub-Proxy certificates from the eToken proxy certificate
additionally to its community-specific IdP. Login via OpenID
Connect through the EGI Check-In service has been successfully
validated.
38. Issues
38www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● Not much technological guidance: Choice left to communities, e.g.
Cloud-ready workflow system, key/value store, queueing,
scheduling, interoperability, Storage
● Inherited legacy codebase and components (gUSE, GridFTP, Globus)
● Federated Authentication was still subject to change during Pilot
(RCauth SLC service not yet working)
● No built-in credential delegation e.g. proxy certificate delegation
● Differences in network implementations on FedCloud sites (required
floating IP requests on some sites, not implemented by middleware
at the time)
● Virtual Appliances needed to be upgraded (was done in SD and
committed back upstream to EGI AppDB)
39. Lessons Learned
39www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● Give more weight to sustainability when choosing tools and
services
● Prefer simplicity in customization and adaptation, e.g. scripted
solutions / plugin mechanisms
● Adapt to external services instead of self-hosting if in doubt
● EOSC could “feature” software and tools, promoting them to
the communities with enough description about usage and
transparency in its sustainability plans.
● The reproducibility problem should start to be addressed
structurally, scaling from ad-hoc solutions to reusable and
more general services. Computational tools offered by EOSC
should be aware of the existence of these service and use
them.
Notes de l'éditeur
The problem: single scientific repository has a limited value, real benefits come from ability to exchange data within a network