SlideShare une entreprise Scribd logo
1  sur  19
ExtremeEarth
From Copernicus Big Data
to Extreme Earth Analytics
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 825258.
12 September 2019
Earth Observation Φ-week, ESA-ESRIN, Frascati, Italy
Manolis Koubarakis, Konstantina Bereta, Dimitris Bilidas, Angelos
Charalabidis, Konstantinos Giannousis, Theofilos Ioannidis, Vangelis
Karkaletsis, Stasinos Konstantopoulos, Despina-Athanasia Pantazi,
George Stamoulis
Big Linked Geospatial Data Tools in ExtremeEarth
3
Linked Open Data
● The vision of linked open data is to go from a Web of
documents to a Web of data:
○ Unlock open data dormant in their silos.
○ Make it available on the Web using Semantic Web
Technologies (HTTP, URIs, RDF, SPARQL)
○ Interlink it with other open data (e.g., from the
European data portal)
4
Big Linked Geospatial Data
● Information and knowledge extracted from EO data is voluminous: 1 PB of
Sentinel data may contain >750*103 products which will result in >450TB of
information and knowledge (e.g., classes of objects).
● >106 PB of data in the Copernicus Open Access Hub
● ExtremeEarth will develop tools for transforming, integrating, querying and
performing geospatial analytics for the big information and knowledge that will
be mined from Copernicus data and other auxiliary data sources using the
deep learning techniques in the project’s pipeline
5
Overview of tools
● GeoTriples-Publishing geospatial data as RDF graphs
● JedAI-Interlinking Framework
● Strabon - A state-of-the-art spatiotemporal RDF store
● Semagrow - A federated SPARQL query processor
● Geographica - A benchmark for big linked geospatial tools
6
GeoTriples-Publishing geospatial data
as RDF graphs
https://geotriples.di.uoa.gr
7
GeoTriples-Spark
● GeoTriples-Spark is a new implementation of GeoTriples, capable of transforming
big geospatial data into RDF.
● We extended GeoTriples to run on top of Hops and Apache Spark.
8
GeoTriples-Spark (2)
9
GeoTriples-Spark CSV transformation
Input Size #Executors Output Size #Execution time(m)
1GB 2 7.7GB 1
10GB 21 83.4GB 1.6
100GB 41 840.1GB 3.3
250GB 60 2.1TB 6.6
500GB 65 4.1TB 13
1TB 70 8.3TB 26
2TB 80 16.6TB 50
10
JedAI Interlinking Framework
● Based on meta-blocking with the ability to discover geospatial relationships
among resources in geospatial RDF stores
● Implemented in Silk and Radon tools
● Adaptation of all JedAI algorithms, including meta-blocking, to Apache Spark
for massive parallelization
● Integration of state-of-the-art, massively parallel geospatial interlinking
methods in JedAI
● Development of novel geospatial interlinking methods based on meta-
blocking
https://jedai.scify.org/
11
Strabon - A state-of-the-art spatiotemporal RDF
store
http://strabon.di.uoa.gr
12
Strabon on the cloud
● Objective of the project is to develop a new cloud-based Strabon implementation. The
results of this task will make Strabon, currently the state-of-the-art open-source geospatial
RDF store, the first GeoSPARQL distributed engine for big geospatial data and extreme
geospatial analytics.
● design and implement a horizontally scalable RDF store to the PB scale. This means that
the components of the system should have the following capabilities and characteristics:
1. Ingest/bulk import RDF data (e.g. N-Triples, Turtle, TriG, N-Quads, JSON-LD,
RDF/XML) that will be provided in a Hadoop-based distributed filesystem
2. Store the imported data on a distributed data store (eg. HBase, Accumulo, etc.)
3. Create the spatial indexing in addition to all standard indexing schemes enforced
4. Provide the interfaces to query the stored data with SPARQL v1.1 and export the
results in various formats
● Two candidate approaches:
○ GeoSPARK
○ GeoMESA
13
Strabon on the cloud (2)
14
Strabon on the cloud (3)
● Use GeoSPARK library:
○ Rich Spatial Functionality on Apache SPARK
○ Supports both RDD and SQL/DataFrame APIs
○ Easy Integration with HOPS platform
● Strabon with GeoSPARK has been tested on Hops.site
○ Datasets from Geographica benchmark with 100M and 500M triples
○ 3 queries of varying selectivity
○ Same storage schema (tables) as in centralized Strabon PostGIS storage
○ 64 executors, 6 GB executor memory, 2 executor cores, 24G master memory, 8 master
cores
15
Experiments (times in seconds)
Query Dataset Strabon(Cold Cache) Strabon(Warm Cache)
GeoSPARK on
HOPS
SC1
100M 284 224 54
500M 1008 975 102
SC2
100M 87 10 46
500M 2785 297 348
SC3
100M 83 9 60
500M 2005 214 324
(-) Centalized Strabon performs better on warm cache for SC2 and SC3 even
for the 500M dataset
(+) GeoSPARK seems to achieving better scalability. Needs further
experiments witg larger datasets
16
Improving the Storage Schema for RDF Data
● Storage schema used in the experiments is not optimized for cloud
processing
● Extended Vertical Partitioning schema from S2RDF seems promising
○ computes semijoins between RDF predicates depending on selectivity
○ cache semijoins results
● Implementing a data loader in HOPS that reads RDF data and creates
Parquet files according to Extended Vertical Partitioning
17
Semagrow - A federated SPARQL query
processor
http://semagrow.github.io/
● Semagrow exposes a single SPARQL endpoint that federates a number of heterogeneous data sources.
● Targets the federation of heterogeneous and independently provided data sources. In other words,
Semagrow aims to offer the most efficient distributed querying solution that can be achieved without
controlling the way data is distributed between sources and, in general, without having the responsibility to
centrally manage the data sources of the federation.
● Integrate multiple data sources at query-time.
○ No need to ingest the whole dataset in order to integrate.
○ All queries return updated answers.
○ Easily add and remove a data source from the federation without affecting the application.
● Data sources can be heterogeneous.
● Semagrow can handle different systems with different query expressivity. Currently supports:
■ SPARQL 1.0 and 1.1
■ CassandraQL
● Objective in Extreme Earth: Integrate geospatial operations in multiple big geospatial data sources.
18
Geographica - A benchmark for big linked
geospatial tools
● Geographica 2 is a recently published version of the Geographica benchmark for evaluating
geospatial RDF stores, which it improves by adding more workloads, extending the existing ones
and evaluating more RDF stores. It tests the efficiency of primitive spatial functions in RDF
stores, the performance of single node RDF stores in real use case scenarios, a more detailed
evaluation is performed using a synthetic workload and finally the scalability of the RDF stores is
revealed with the scalability workload.
● Geographica will be futher extended with data sources and scenarios from the two use cases of
ExtremeEarth, so that it can be used to evaluate the performance of the transformation,
interlinking, querying and integration systems in cloud environments.
● http://geographica2.di.uoa.gr/
Thank you!

Contenu connexe

Tendances

Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechRob Emanuele
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman
 
OpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences DataOpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences DataOpenTopography Facility
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechRob Emanuele
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersIan Foster
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationRob Emanuele
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Robert Grossman
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Robert Grossman
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Robert Grossman
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataRicard de la Vega
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)Robert Grossman
 

Tendances (20)

Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
OpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences DataOpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences Data
 
ICESat-2 Metadata and Status
ICESat-2 Metadata and StatusICESat-2 Metadata and Status
ICESat-2 Metadata and Status
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
Geospatial data
Geospatial dataGeospatial data
Geospatial data
 
Omid: A transactional Framework for HBase
Omid: A transactional Framework for HBaseOmid: A transactional Framework for HBase
Omid: A transactional Framework for HBase
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Pilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOTPilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOT
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)
 

Similaire à Big linked geospatial data tools in ExtremeEarth-phiweek19

Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopExtremeEarth
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopExtremeEarth
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020GEO Analytics Canada
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...BigData_Europe
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19ExtremeEarth
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Databricks
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...ExtremeEarth
 
Ncar globally accessible user environment
Ncar globally accessible user environmentNcar globally accessible user environment
Ncar globally accessible user environmentinside-BigData.com
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research PlatformLarry Smarr
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Ivan Ermilov
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkSupriya .
 
Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation George Percivall
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Kostis Kyzirakos
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 

Similaire à Big linked geospatial data tools in ExtremeEarth-phiweek19 (20)

Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
Artificial Intelligence and Big Data Technologies for Copernicus Data: the Ex...
 
Ncar globally accessible user environment
Ncar globally accessible user environmentNcar globally accessible user environment
Ncar globally accessible user environment
 
A Performance Study of Big Spatial Data Systems
A Performance Study of Big Spatial Data SystemsA Performance Study of Big Spatial Data Systems
A Performance Study of Big Spatial Data Systems
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 

Plus de ExtremeEarth

Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open WorkshopExtremeEarth
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopExtremeEarth
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth
 
ExtremeEarth Open Workshop - Overview and Achievements
ExtremeEarth Open Workshop - Overview and AchievementsExtremeEarth Open Workshop - Overview and Achievements
ExtremeEarth Open Workshop - Overview and AchievementsExtremeEarth
 
AI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopAI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopExtremeEarth
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopExtremeEarth
 
Food Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open WorkshopFood Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open WorkshopExtremeEarth
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...ExtremeEarth
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationExtremeEarth
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19ExtremeEarth
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19ExtremeEarth
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19ExtremeEarth
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020ExtremeEarth
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectExtremeEarth
 

Plus de ExtremeEarth (15)

Polar Use Case - ExtremeEarth Open Workshop
Polar Use Case  - ExtremeEarth Open WorkshopPolar Use Case  - ExtremeEarth Open Workshop
Polar Use Case - ExtremeEarth Open Workshop
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
ExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - IntroductionExtremeEarth Open Workshop - Introduction
ExtremeEarth Open Workshop - Introduction
 
ExtremeEarth Open Workshop - Overview and Achievements
ExtremeEarth Open Workshop - Overview and AchievementsExtremeEarth Open Workshop - Overview and Achievements
ExtremeEarth Open Workshop - Overview and Achievements
 
AI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open WorkshopAI models for Ice Classification - ExtremeEarth Open Workshop
AI models for Ice Classification - ExtremeEarth Open Workshop
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open Workshop
 
Food Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open WorkshopFood Security Use Case - ExtremeEarth Open Workshop
Food Security Use Case - ExtremeEarth Open Workshop
 
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation DataExtremeEarth Data Science Pipeline for Linked Earth Observation Data
ExtremeEarth Data Science Pipeline for Linked Earth Observation Data
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...
 
Snow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and IrrigationSnow Monitoring for Water Availability and Irrigation
Snow Monitoring for Water Availability and Irrigation
 
Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19Polar Use Case in ExtremeEarth-phiweek19
Polar Use Case in ExtremeEarth-phiweek19
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19Food security use case in ExtremeEarth-phiweek19
Food security use case in ExtremeEarth-phiweek19
 
Copernicus and AI workshop 2020
Copernicus and AI workshop 2020Copernicus and AI workshop 2020
Copernicus and AI workshop 2020
 
LPS19 ExtremeEarth Project
LPS19 ExtremeEarth ProjectLPS19 ExtremeEarth Project
LPS19 ExtremeEarth Project
 

Dernier

Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 

Dernier (20)

Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

Big linked geospatial data tools in ExtremeEarth-phiweek19

  • 1. ExtremeEarth From Copernicus Big Data to Extreme Earth Analytics This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825258.
  • 2. 12 September 2019 Earth Observation Φ-week, ESA-ESRIN, Frascati, Italy Manolis Koubarakis, Konstantina Bereta, Dimitris Bilidas, Angelos Charalabidis, Konstantinos Giannousis, Theofilos Ioannidis, Vangelis Karkaletsis, Stasinos Konstantopoulos, Despina-Athanasia Pantazi, George Stamoulis Big Linked Geospatial Data Tools in ExtremeEarth
  • 3. 3 Linked Open Data ● The vision of linked open data is to go from a Web of documents to a Web of data: ○ Unlock open data dormant in their silos. ○ Make it available on the Web using Semantic Web Technologies (HTTP, URIs, RDF, SPARQL) ○ Interlink it with other open data (e.g., from the European data portal)
  • 4. 4 Big Linked Geospatial Data ● Information and knowledge extracted from EO data is voluminous: 1 PB of Sentinel data may contain >750*103 products which will result in >450TB of information and knowledge (e.g., classes of objects). ● >106 PB of data in the Copernicus Open Access Hub ● ExtremeEarth will develop tools for transforming, integrating, querying and performing geospatial analytics for the big information and knowledge that will be mined from Copernicus data and other auxiliary data sources using the deep learning techniques in the project’s pipeline
  • 5. 5 Overview of tools ● GeoTriples-Publishing geospatial data as RDF graphs ● JedAI-Interlinking Framework ● Strabon - A state-of-the-art spatiotemporal RDF store ● Semagrow - A federated SPARQL query processor ● Geographica - A benchmark for big linked geospatial tools
  • 6. 6 GeoTriples-Publishing geospatial data as RDF graphs https://geotriples.di.uoa.gr
  • 7. 7 GeoTriples-Spark ● GeoTriples-Spark is a new implementation of GeoTriples, capable of transforming big geospatial data into RDF. ● We extended GeoTriples to run on top of Hops and Apache Spark.
  • 9. 9 GeoTriples-Spark CSV transformation Input Size #Executors Output Size #Execution time(m) 1GB 2 7.7GB 1 10GB 21 83.4GB 1.6 100GB 41 840.1GB 3.3 250GB 60 2.1TB 6.6 500GB 65 4.1TB 13 1TB 70 8.3TB 26 2TB 80 16.6TB 50
  • 10. 10 JedAI Interlinking Framework ● Based on meta-blocking with the ability to discover geospatial relationships among resources in geospatial RDF stores ● Implemented in Silk and Radon tools ● Adaptation of all JedAI algorithms, including meta-blocking, to Apache Spark for massive parallelization ● Integration of state-of-the-art, massively parallel geospatial interlinking methods in JedAI ● Development of novel geospatial interlinking methods based on meta- blocking https://jedai.scify.org/
  • 11. 11 Strabon - A state-of-the-art spatiotemporal RDF store http://strabon.di.uoa.gr
  • 12. 12 Strabon on the cloud ● Objective of the project is to develop a new cloud-based Strabon implementation. The results of this task will make Strabon, currently the state-of-the-art open-source geospatial RDF store, the first GeoSPARQL distributed engine for big geospatial data and extreme geospatial analytics. ● design and implement a horizontally scalable RDF store to the PB scale. This means that the components of the system should have the following capabilities and characteristics: 1. Ingest/bulk import RDF data (e.g. N-Triples, Turtle, TriG, N-Quads, JSON-LD, RDF/XML) that will be provided in a Hadoop-based distributed filesystem 2. Store the imported data on a distributed data store (eg. HBase, Accumulo, etc.) 3. Create the spatial indexing in addition to all standard indexing schemes enforced 4. Provide the interfaces to query the stored data with SPARQL v1.1 and export the results in various formats ● Two candidate approaches: ○ GeoSPARK ○ GeoMESA
  • 13. 13 Strabon on the cloud (2)
  • 14. 14 Strabon on the cloud (3) ● Use GeoSPARK library: ○ Rich Spatial Functionality on Apache SPARK ○ Supports both RDD and SQL/DataFrame APIs ○ Easy Integration with HOPS platform ● Strabon with GeoSPARK has been tested on Hops.site ○ Datasets from Geographica benchmark with 100M and 500M triples ○ 3 queries of varying selectivity ○ Same storage schema (tables) as in centralized Strabon PostGIS storage ○ 64 executors, 6 GB executor memory, 2 executor cores, 24G master memory, 8 master cores
  • 15. 15 Experiments (times in seconds) Query Dataset Strabon(Cold Cache) Strabon(Warm Cache) GeoSPARK on HOPS SC1 100M 284 224 54 500M 1008 975 102 SC2 100M 87 10 46 500M 2785 297 348 SC3 100M 83 9 60 500M 2005 214 324 (-) Centalized Strabon performs better on warm cache for SC2 and SC3 even for the 500M dataset (+) GeoSPARK seems to achieving better scalability. Needs further experiments witg larger datasets
  • 16. 16 Improving the Storage Schema for RDF Data ● Storage schema used in the experiments is not optimized for cloud processing ● Extended Vertical Partitioning schema from S2RDF seems promising ○ computes semijoins between RDF predicates depending on selectivity ○ cache semijoins results ● Implementing a data loader in HOPS that reads RDF data and creates Parquet files according to Extended Vertical Partitioning
  • 17. 17 Semagrow - A federated SPARQL query processor http://semagrow.github.io/ ● Semagrow exposes a single SPARQL endpoint that federates a number of heterogeneous data sources. ● Targets the federation of heterogeneous and independently provided data sources. In other words, Semagrow aims to offer the most efficient distributed querying solution that can be achieved without controlling the way data is distributed between sources and, in general, without having the responsibility to centrally manage the data sources of the federation. ● Integrate multiple data sources at query-time. ○ No need to ingest the whole dataset in order to integrate. ○ All queries return updated answers. ○ Easily add and remove a data source from the federation without affecting the application. ● Data sources can be heterogeneous. ● Semagrow can handle different systems with different query expressivity. Currently supports: ■ SPARQL 1.0 and 1.1 ■ CassandraQL ● Objective in Extreme Earth: Integrate geospatial operations in multiple big geospatial data sources.
  • 18. 18 Geographica - A benchmark for big linked geospatial tools ● Geographica 2 is a recently published version of the Geographica benchmark for evaluating geospatial RDF stores, which it improves by adding more workloads, extending the existing ones and evaluating more RDF stores. It tests the efficiency of primitive spatial functions in RDF stores, the performance of single node RDF stores in real use case scenarios, a more detailed evaluation is performed using a synthetic workload and finally the scalability of the RDF stores is revealed with the scalability workload. ● Geographica will be futher extended with data sources and scenarios from the two use cases of ExtremeEarth, so that it can be used to evaluate the performance of the transformation, interlinking, querying and integration systems in cloud environments. ● http://geographica2.di.uoa.gr/