This presentation is about a framework for semantically-enabled data discovery and integration across multiple Earth science disciplines. Data harmonization was based on the principles of Linked Data. Previous works define the Data Cube extensions which are relevant to certain Earth science disciplines. To provide a generic and domain independent solution, we propose an upper level vocabulary (the ENVRI vocabulary) that allows us to express domain specific information at a higher level of abstraction.
From a human viewpoint we provide an interactive Web based user interface for data discovery and integration across multiple research infrastructures (http://portal.envri.eu). The system is demonstrated on a use case of the Iceland Volcano’s eruption on April 10, 2010.
Semantically-Enabled Environmental Data Discovery and Integration: Demonstration Using the Iceland Volcano Use Case
1. Semantically-Enabled
Environmental Data Discovery and Integration:
Demonstration Using the Icelandic Volcano Use Case
Tatiana Tarasova
1
Massimo Argenti
1 ISLA,
2
Maarten Marx
University of Amsterdam
2 European
Space Agency
October 7, 2013
System description papers, KESW 2013
1
3. Example Case Study
Icelandic Volcano Eruption [6]
What was the impact of the eruption of the Icelandic Volcano
(Eyjafjallaj¨kull) in 2010 on the environment?
o
4. What data can contribute to the research?
?
?
SIOS
?
AURORA
BOREALIS
EMSO
?
?
?
EUFAR-‐COPAL
LIFEWATCH
EISCAT-‐3D
EPOS
EURO-‐ARGO
IAGOS-‐ERI
?
ICOS
ATC
5. Technological and Structural Data Heterogeneity
CSV
FTP
catalogues
Ocean
temperature
EURO-‐ARGO
?
NetCDF
Authorized
IP
access
Atmospheric
measurements
ICOS
ATC
6. Semantic Data Heterogeneity
hourly
ppm
measurements
level
2
flask
CSV
FTP
catalogues
Ocean
temperature
EURO-‐ARGO
plaBorm
good
quality
?
float
trajectories
NetCDF
Authorized
IP
access
Atmospheric
measurements
ICOS
ATC
8. Environmental Data Discovery
Approach
discover data through a single harmonized metadata catalogue
enable semantic data discovery through semantic tagging of datasets
Implementation
ENVRI portal http://portal.envri.eu
OpenSearch [11] based catalogue
1350 data series, 288.971 triples stored in SESAME [12]
geospatial metadata model that extends the INSPIRE guidelines [7]
with richer semantics
http://portal.genesi-dec.eu/news/?id=117
semantic tagging against a set of the Earth Science vocabularies
(GCMD [8], SBA [9], GEMET [10])
10. Linked Environmental Data
Linked Data [13]
→ publish data not documents!
Environmental Data
→ datasets with observations
Linked Environmental Data
→ publish observations not datasets!
→ fine-grain representation of environmental data will bring new
opportunities to query and integrate environmental data at the level
of single observations
11. Atmospheric Measurements (ICOS) [14]
Dataset “CO2 concentration measured by Mace Head”
Dimensions:
Time
Geospatial location
Unit of Measure
Observed Phenomenon …
Observation
“CO2 concentration in the air
Measured by Mace Head
on 2010-01-05
was 392.011”
14. Related Work
RDF Data Cube [17] based approaches
Linked Environmental Data [1]
The ACORN-SAT Linked Climate Dataset [2]
A Linked Data Framework for Publishing the UK Environmental Data [4]
Data Cube: core model
Observa2on
has dataset
Dataset
Structure
Dataset
has structure
has dimension
Dimension
15. But what about semantic data interoperability?
Question
Can we find generic concepts to capture domain semantics of
environmental data?
Data Cube Extension
Observa.on
has dataset
Dataset
Structure
Dataset
has structure
...
has location
has time
Loca.on
Time
has Feature of
Interest
FeatureOfInterest
16. ENVRI vocabulary [18] (based on OGC O&M [19])
Location
Time
Feature of
Interest
has Feature of Interest
has Time
has Property
has Location
has Observed Property
Observed
Property
Observation
has Result
has procedure
Procedure
Result
18. Demonstration
Data
ICOS CO2 concentration, Euro-Argo - ocean temperature
35 collections, 10.520 observations, 136.556 RDF triples
Implementation
RDF generation - RDF Data Cube plug-in for Google Refine [5]
storage - Virtuoso RDF store
access - http://data.politicalmashup.nl/sparql/
19. Queries
1: query for individual and subsets of observations
Retrieve all the observations for the days of the Volcano eruption
(from 20 March to 23 June, 2010).
20. Queries
1: query for individual and subsets of observations
Retrieve all the observations for the days of the Volcano eruption
(from 20 March to 23 June, 2010).
2: exploit the semantics of the terms of the ENVRI vocabulary
What phenomena were measured in 2010 in the area next to the
Volcano?
What instruments were used to make measurements in 2010 in the
area next to the Volcano?
21. Conclusion
→ data discovery through a harmonized metadata catalogue based on
the geospatial metadata model
→ fine-grain representation of environmental data enables queries that
retrieve and integrate data at the level of single observation instead of
pre-defined collections
→ ENVRI vocabulary enables semantically rich queries
22. Conclusion
→ data discovery through a harmonized metadata catalogue based on
the geospatial metadata model
→ fine-grain representation of environmental data enables queries that
retrieve and integrate data at the level of single observation instead of
pre-defined collections
→ ENVRI vocabulary enables semantically rich queries
Future Work
→ Alignment between data models for data discovery and data
harmonization
→ Systematic study of the proposed modelling solution
23. Questions?
Thank you!
→ ENVRI portal http://portal.envri.eu
→ more about Linked Environmental Data
http://staff.science.uva.nl/~ttaraso1/html/envri.html
24. References I
R¨ther, M., Fock, J., and Hubener, J.: Linked Environmental Data. 24th
u
International Conference on Informatics for Environmental Protection (2010)
R¨ther, M., Fock, J., and Hubener, J.: The ACORN-SAT Linked Climate Dataset.
u
Semantic Web Journal (2013)
http://www.semantic-web-journal.net/system/files/swj457.pdf
The ENVRI vocabulary
http://data.politicalmashup.nl/RDF/vocabularies/envri
Shaon, A., Woolf, A., Boczek, R., Rogers, W., and Jackson, M.: An Open Source
Linked Data Framework for Publishing Environmental Data under the UK Location
Strategy. Proceedings of the Terra Cognita Workshop on Foundations,
Technologies and Applications of the Geospatial Web (2011)
http://ceur-ws.org/Vol-798/paper6.pdf
The Data Cube plug-in for Google Refine http://refine.deri.ie/qbExport
2010 eruptions of Eyjafjallaj¨kull on Wikipedia
o
http://en.wikipedia.org/wiki/2010_eruptions_of_Eyjafjallaj%C3%B6kull
25. References II
State of progress in the development of guidelines to express elements of the
Infrastructure for Spatial Information in the European Community (INSPIRE)
metadata implementing rules using ISO 15836 (Dublin Core). European
Commission (2008) http://inspire.jrc.ec.europa.eu/reports/
ImplementingRules/metadata/MD_IR_and_DC_state%20of%20progress.pdf
The Global Change Master Directory (GCMD) http://gcmd.nasa.gov/
The Societal Benefit Area vocabularies (SBA)
http://www.earthobservations.org/
The GEneral Multilingual Environmental Thesaurus (GEMET)
http://www.eionet.europa.eu/gemet/
The OpenSearch standard protocol http://www.opensearch.org/
http://www.openrdf.org/
Berners-Lee, T.: Linked data - design issues, 2006.
http://www.w3.org/DesignIssues/LinkedData.html
26. References III
The Integrated Carbon Dioxide System (ICOS), Atmospheric Measurements
System https://icos-atc-demo.lsce.ipsl.fr/
Euro-Argo http://www.argodatamgt.org/
The Aerosols, Clouds, and Trace Gasses Research Infrastructure Network (ACTRIS)
www.actris.net
The Data Cube vocabulary
http://www.w3.org/TR/2013/WD-vocab-data-cube-20130312/
The ENVRI vocabulary.
http://data.politicalmashup.nl/RDF/vocabularies/envri
Geographic Information: Observations and Measurements. OGC Abstract
Specification http://www.opengeospatial.org/standards/om