Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Aspects of Reproducibility in Earth Science
1. Aspects of Reproducibility in Earth
Science – ongoing work
Raul Palma
Poznan Supercomputing and Networking Center, Poland
Dagstuhl seminar: Reproducibility of Data-Oriented Experiments in e-Science
January, 2016
2. Context
Acronym: EVER-EST
Full title: European Virtual Environment for Research - Earth Science Themes
Type of funding
scheme:
Research and Innovation Actions
Work Programme
topic addressed:
Call EINFRA-9-2015 – e-Infrastructures for Virtual Research Environments (VRE)
• Project ID: 674907
• Project Type: RIA
• Start Date: 01.10.2015
• Duration: 36 Months
• Website: TBC
• Maximum Grant Amount: 6,649,002 €
• Total funded effort in person/months: 663
• Coordinator: European Space Agency
• Contact Person: Mirko Albani (ESA)
4. Key objectives
Establish a VRE e-infrastructure for Earth Science
addressing the needs of different ES communities
to facilitate their collaborative working and research
Discover, access, assess and process existing and new heterogeneous ES
datasets and preserved knowledge held by distributed data centres
Share data, models, algorithms, scientific results and their own experiences
within a community or across communities
Capture, annotate and store the workflows, processes and results from their
research activities;
Ensure the long-term sustainability and preservation of data, models, workflows,
tools and services developed by existing communities
Validate the VRE with four main Virtual Research Communities
Sea Monitoring VRC
Natural Hazards VRC (floods, geological, weather, wildfires)
Land Monitoring VRC
Supersites VRC (volcanoes and seismic)
5. Key objectives
Define, implement and validate the
Research Objects (RO) concepts and
technologies within the ES context as
the mean for sharing information and
establish more effective collaboration in
the VRE
9. Supersite Science - ES VRC
(more concrete story)
Historical science mostly based on
past observations, as opposed to
experimental science
Testing of hypothesis is not normally the main activity
Main activities of the VRC:
measure geophysical parameters in the natural
environment,
derive information on the effects of the phenomena and processes,
model this information to generate space/time representations of
geophysical phenomena,
provide these representations to risk management stakeholders,
use the information to develop theories or confirm hypotheses
10. Supersite VRC operational scenario
In situ data providers (normally local monitoring agencies) provide open
access to their data collections (with a data policy), including raw and
processed data
Space agencies acquire and distribute satellite EO data (personal licenses
to sign)
Authorized scientists should be able to access and display the data online,
process them using community tools, validate the results, model the
validated data, generate research products and build consensus on scientific
information for end-users
Authorized end-users (local) should be able to access the scientific
information online and provide feedback
The general public should be able to browse part of the data, the published
results, part of the scientific information provided to users (if the latter
authorize disclosure)
With a Supersite agreement in place:
11. Research Objects in Supersite
VRC
Current main use scenarios
Documentation/communication
Reproducibility of scientific results
12. Research Objects in Supersite
VRC
Document best practices (WFs, analysis methods, monitoring
methods, etc.)
Training purposes
Provide long term preservation of scientific knowledge (how data
are analyzed, how results are validated, etc.)
Provide long term preservation of end-user stories (demonstrating
scientist-end-user interactions)
Public dissemination
Provide good management of intellectual property, through licensing
and PID/DOI, to allow fast work recognition
Others tbd
Documentation/communication
13. Research Objects in Supersite
VRC
Execute “standard” WFs for data analysis/modelling.
validating results
generate “standard” products (e.g. deformation maps) as mass
products
training
Testing algorithms and data, either
modifying the WF to execute new analysis methods/models on
the same dataset, or
executing the original WF on different Supersites datasets
Others tbd
Reproducibility of scientific results
14. Some issues in reproducibility
The VRC is not (yet) using formalized WFs. Their use, and the use of
ROs, must be promoted through a simple, incremental approach.
The data access may be tricky, since their formats and metadata could
depend on the Supersite.
Some datasets (and most results) are not maintained by external sources and
should be stored in the VRE (and exported as web services to the outside).
WFs reproducibility can be a problem, since they could use a mix of
COTS and scientific SW, with licensing, HW compatibility, and computational
resources issues.
They do not use web processing services at present.
WFs are rarely fully automated.
Some may require considerable manual intervention.
Some other use a trial and error procedure, during repeated execution one could
discard some data or choose different parameters.
In general some internal WF decisions may be based on expert judgment and
should be documented.
16. RO example for the Supersite
VRC
Ground deformation mapping is a typical use case for this VRC.
It may be carried out by different researchers on different volcanoes or even
on the same volcano.
It normally consists of two consecutive WFs:
the analysis of a multitemporal InSAR image dataset to calculate ground
displacement time series
the validation of the results by comparison with other data or results.
RO for Volcano deformation mapping
17. RO example for the Supersite
VRC
The main engine of the WF is the analysis SW (COTS): SarScape, which
requires IDL.
Other scientists may be more comfortable using other SW, or even using
remote processing services (as those provided by the GEP).
Input data are normally accessed through remote web services:
ESA Virtual Archive, Sentinel Hub, DLR Supersite portal, ASI Data Gateway.
Validation data (GPS time series, previous deformation data, levelling
data) are not always provided as a service.
Output results must be placed in the VRC database, and exported as web
services.
They are subsequently used by other scientists during a consensus process to
generate a final product for the End-users.
RO for Volcano deformation mapping
Notes de l'éditeur
On the one hand reviewers need to evaluate whether the findings are worthwhile and novel and the method sound.
On the other hand, the reader needs to trust what she reads
Scientific communications have at least two goals:
To announce a result
To convince readers that the result is correct
Science is incrementally built on results which can be reused and therefore reproduced for validation
Experimental science should describe the results and provide a clear enough protocol for successful repetition and extension