In this paper, we propose the problem of implementing an efficient query processing system for incomplete temporal and geospatial information in RDFi as a challenge to the SSTD community.
1. Querying Incomplete Geospatial
Information in RDF
Charalampos Nikolaou and Manolis Koubarakis
Department of Informatics and Telecommunications
National and Kapodistrian University of Athens
International Symposium on Spatial and Temporal Databases (SSTD) 2013
August 23, 2013
2. Motivation
• Increased interest in publishing geospatial datasets
as linked data (i.e., encoded in RDF and with
semantic links to other datasets)
• Geospatial information might be:
o Quantitative (e.g., exact geometric information)
o Qualitative (e.g., topological relations)
... and express knowledge that is
o Complete
o Incomplete (or indefinite)
6. Linked Geospatial Data
DB
Tropes
Hellenic
FBD
Hellenic
PD
Crime
Reports
UK
NHS
(EnAKTing)
Open
Election
Data
Project
EU
Institutions
CO2
Emission
(EnAKTing)
Energy
(EnAKTing)
EEA
Mortality
(EnAKTing)
Ordnance
Survey
legislation
data.gov.uk
UK Postcodes
ESD
standards
ISTAT
Immigration
Lichfield
Spending
Scotland
Pupils &
Exams
Traffic
Scotland
Data
Gov.ie
reference
data.gov.
uk
TWC LOGD
transport
data.gov.
uk
Eurostat
Eurostat
(FUB)
(RKB
Explorer)
Linked
EDGAR
(Ontology
Central)
EURES
(Ontology
Central)
GovTrack
Finnish
Municipalities
New
York
Times
World
Factbook
Geo
Species
Italian
public
schools
Project
Gutenberg
UMBEL
riese
dbpedia
lite
dataopenac-uk
TCM
Gene
DIT
Daily
Med
YAGO
Open
Cyc
data
dcs
Diseasome
Enipedia
Lexvo
DBLP
(L3S)
Twarql
LinkedCT
EUNIS
Cornetto
SMC
Journals
Ocean
Drilling
Codices
Turismo
de
Zaragoza
Janus
AMP
Linked
GeoData
WordNet
(W3C)
Alpine
Ski
Austria
AEMET
Metoffice
Weather
Forecasts
PDB
Weather
Stations
Yahoo!
Geo
Planet
National
Radioactivity
JP
ChEMBL
Open
Data
Thesaurus
Sears
GESIS
Pisa
RESEX
Scholarometer
ACM
NVD
IBM
DEPLOY
Newcastle
RAE2001
LOCAH
Roma
CiteSeer
Courseware
dotAC
ePrints
IEEE
RISKS
PROSITE
Affymetrix
SISVU
GEMET
Airports
STW
Budapest
IRIT
VIVO
Indiana
(Bio2RDF)
PubMed
ProDom
VIVO
Cornell
STITCH
LAAS
NSF
KISTI
Linked
Open
Colors
SGD
Gene
Ontology
AGROV
OC
Product
DB
DBLP
(RKB
Explorer)
Swedish
Open
Cultural
Heritage
JISC
WordNet
(RKB
Explorer)
EARTh
lobid
Organisations
ECS
(RKB
Explorer)
HGNC
LODE
Climbing
NSZL
Catalog
Wiki
ECS
Southampton
ECS
Southampton
EPrints
Eurécom
UniProt
Taxono
my
lobid
Resources
Pfam
UniProt
WordNet
(VUA)
Ulm
P20
UN/
LOCODE
SIDER
Drug
Bank
Europeana
OAI
DBLP
(FU
Berlin)
ERA
lingvoj
VIAF
Deutsche
Biographie
~ 62 billion
triples
BibBase
Uberblic
Norwegian
MeSH
UB
Mannheim
Calames
BNB
Freebase
Rådata
nå!
GND
ndlna
data
bnf.fr
OS
DBpedia
GeoWord
Net
El
Viajero
Tourism
IdRef
Sudoc
iServe
Geo
Names
LCSH
Sudoc
RDF
Book
Mashup
LIBRIS
PSH
DDC
Open
Calais
Greek
DBpedia
ntnusc
MARC
Codes
List
totl.net
US Census
(rdfabout)
Piedmont
Accomodations
URI
Burner
LEM
Thesaurus W
SW
Dog
Food
Portuguese
DBpedia
t4gm
info
RAMEAU
SH
LinkedL
CCN
theses.
fr
my
Experiment
flickr
wrappr
NDL
subjects
Open
Library
(Talis)
Plymouth
Reading
Lists
Revyu
Fishes
of Texas
(rdfabout)
Scotland
Geography
Linked
MDB
Event
Media
US SEC
Semantic
XBRL
FTS
Chronicling
America
Telegraphis
Linked
Sensor Data
(Kno.e.sis)
Eurostat
Goodwin
Family
NTU
Resource
Lists
Open
Library
SSW
Thesaur
us
semantic
web.org
BBC
Music
Geo
Linked
Data
Source Code
Ecosystem
Linked Data
Didactal
ia
Pokedex
St.
Andrews
Resource
Lists
Manchester
Reading
Lists
gnoss
Poképédia
Classical
(DB
Tune)
BBC
Wildlife
Finder
NASA
(Data
Incubator)
Ontos
News
Portal
Sussex
Reading
Lists
Bricklink
yovisto
Semantic
Tweet
Linked
Crunchbase
Jamendo
(DBtune)
Music
Brainz
(DBTune)
Last.FM
(rdfize)
Taxon
Concept
LOIUS
CORDIS
CORDIS
(FUB)
(Data
Incubator)
BBC
Program
mes
Rechtspraak.
nl
Openly
Local
data.gov.uk
intervals
London
Gazette
Discogs
(DBTune)
OpenEI
statistics
data.gov.
uk
GovWILD
Brazilian
Politicians
educatio
n.data.g
ov.uk
Music
Brainz
(zitgist)
RDF
ohloh
FanHubz
patents
data.go
v.uk
research
data.gov.
uk
Klappstuhlclub
Lotico
(Data
Incubator)
Last.FM
artists
Population (EnAKTing)
reegle
Ren.
Energy
Generators
(DBTune)
Surge
Radio
tags2con
delicious
Slideshare
2RDF
(DBTune)
Music
Brainz
John
Peel
EUTC
Productions
business
data.gov.
uk
Crime
(EnAKTing)
Ox
Points
GTAA
Magnatune
Linked
User
Feedback
LOV
Audio
Scrobbler
Moseley
Folk
OMIM
MGI
InterPro
Smart
Link
Product
Types
Ontology
Open
Corporates
Italian
Museums
Amsterdam
Museum
UniParc
UniRef
UniSTS
Linked
Open
Numbers
Reactome
OGOLOD
Pub
Chem
GeneID
KEGG
Pathway
Medi
Care
Google
Art
wrapper
meducator
KEGG
Drug
UniPath
way
Chem2
Bio2RDF
Homolo
Gene
VIVO UF
ECCOTCP
bible
ontology
KEGG
Enzyme
PBAC
KEGG
Reaction
KEGG
Compound
KEGG
Glycan
Media
Geographic
Publications
User-generated content
Government
Cross-domain
Life sciences
As of September 2011
7. Question
How do we manage (represent, store,
query) this data efficiently?
8. Challenges: Theory
① RDF extensions for representing and querying incomplete
qualitative and quantitative geospatial information
•
GeoSPARQL
•
We proposed RDFi
•
No published algorithm for query processing when considering
RCC-8 and constants
o Standard OGC query language for RDF data with geospatial information
o Topological relations can be expressed/queried, but no reasoning is
offered.
o Can work with any topological/temporal constraint language
with/without constant symbols (e.g., RCC-5, RCC-8, IA)
o Formal semantics and algorithm for computing certain answers
o Preliminary complexity results for various constraint languages
10. i
RDF
by example (cont’d)
Query: Find fires inside the
region of West Greece.
West
Greece
GeoSPARQL query:
Olympia
CERTAIN SELECT ?f
WHERE {
?f rdf:type noa:Fire.
gag:WestGreece geo:sfContains ?f.
}
11. i
RDF
by example (cont’d)
Query: Find fires inside the
region of West Greece.
contains
contains
West
Greece
Olympia
GeoSPARQL query:
CERTAIN SELECT ?f
WHERE {
?f rdf:type noa:Fire.
gag:WestGreece geo:sfContains ?f.
}
12. Challenges: Theory
② Efficient computation of the entailment relation
Φ⊨Θ
• where Φ and Θ are quantifier-free first-order
formulas of a constraint language expressing the
topological relations of various frameworks (RCC-8,
DE-9IM, etc.)
13. Challenges: Theory
③ Computing entailment is equivalent to checking
consistency of formulas with constraint networks
• Constraint networks:
o Spatial relations among regions
o Regions might be constant ones (exact geometric
information) or identified by a URI
• Most recent results considered basic and complete
RCC-5 networks with polygonal regions
• For RCC-8, deciding consistency is NP-complete
• No published algorithm for checking consistency
• Are there tractable cases?
14. Challenges: Practice
④ Scale to billions of triples
• Reasoners from QSR scale only up to hundreds of regions
with complex spatial relations
How do they perform in our case?
• Setting:
o
o
o
o
Real linked geospatial datasets
No constants
Only base RCC-8 relations
Evaluation of consistency checking using the well-known
path-consistency algorithm
15. Experimental evaluation
after one day
•
Computation of
the complete
constraint
network
•
Running time:
O(n3)
•
Memory
requirements:
O(n2)
n ≈ thousands to
millions
hundreds of
regions
thousands of
regions
thousands of
regions
thousands of
regions
Setup: Intel Xeon E5620, 2.4 GHz, 12MB L3, 48GB RAM, RAID 5, Ubuntu 12.04
16. Network structure
• We have started working on algorithms taking into
account the structure of these networks:
o Node degrees fit a power-law distribution
o Network is sparse
17. Network structure (cont’d)
• Edges of three kinds:
non-tangential proper part
externally connected
equals
• Reflect networks composed of components with
hierarchical structure
o R-tree extensions (Papadias, Kalnis, Mamoulis, AAAI’99)
• Parallel algorithms combined with backward-chaining
techniques for lazy query processing
o Graph partitioning
o Path compression data structures and indexes
18. Related work: Spatial
• Qualitative spatial reasoning
- Efficient algorithms for consistency checking of constraint
networks (complex spatial relations, few number of regions)
- Does not consider query processing
• Description logic reasoners
- PelletSpatial: RCC-8 reasoning (cannot handle disjunctions)
- RacerPro: RCC-8 reasoning
19. Related work: Temporal
• Chaudhuri (VLDB’88)
• The knowledge representation language Telos (TOIS’90)
• Foundations of temporal constraint databases (Koubarakis,
PhD thesis, ‘94)
• Qualitative temporal reasoning community (since 80s)
• SQL+i system (BNCOD‘96)
• Later system (IEEE’97)
• Hurtado and Vaisman (2006)
20. Conclusions
• What’s the CHALLENGE?
Implementing an efficient query processing system
for incomplete geospatial information in RDFi
• The desired system should:
o reason about qualitative and quantitative spatial
information that might be incomplete
o be scalable to billions of triples in the most useful cases
Ordnance Survey is Great Britain's national mapping authority. It offers digital and paper map products for a wide range of business and outdoor uses.
GADM is a spatial database of the location of the world's administrative areas for use in GIS and similar software.
NUTS is a hierarchical system defined by the Eurostat office of the European Union for dividing the economic territory of EU in 4 levels.
Chaudhuri (VLDB’88)Framework for temporal relationships in a database employing a graph model (limited to definite information) The knowledge representation language Telos (1991)Preliminary Prolog implementation by M. Koubarakis and T. Topaloglou. The most efficient implementation of Telos (ConceptBase) does not consider incomplete information.Foundations of temporal constraint databases (Koubarakis, PhD thesis 1994)Database models for (indefinite) temporal constraint databasesSQL+i (1996)Temporal RDBMS for modeling and querying indeterminate temporal factsRepresentation and reasoning employing constraint networksLater system (1997)Querying of temporal knowledge basesLimited query language (no disjunctive expressions)