This document discusses integrating authoritative and volunteered geographic information using an ontological approach. It presents the problems caused by semantic heterogeneity between different data sources. The proposed approach uses a domain ontology and R2RML mappings to provide a common conceptualization and allow flexible integration of datasets in RDF. Current work is analyzing semantic heterogeneity in OpenStreetMap tags by studying how tags used for real-world features vary over spatial scale and time. Future work includes developing the ontology further and creating a user interface for the R2RML mappings.
1. INTEGRATING AUTHORITATIVE
AND VOLUNTEERED
GEOGRAPHIC INFORMATION -
AN ONTOLOGICAL APPROACH
Crowd Sourcingin
NationalMapping
InternshipFunding
ACTIVITYWorkshop
Leuven(Belgium)
14thMay2013
JimenaMartínezRamos
[jmr125@mun.ca]
2. Table of contents
2
1. Background
2. Problem
3. Objective
4. Proposed approach
5. Semantics in OSM datasets (ongoing work)
6. Conclusions and future work
3. Background
3
(*) http://ggim.un.org/ (UN Report, 2012)
National Mapping Agencies (NMAs) are likely to
find difficult to justify the costs of traditional
data maintenance mechanisms.
VGI projects (like OpenStreetMap) are growing
and are seen as a good data source to be
integrated with authoritative datasets.
There is growing need of integrating different
data sources.
Semantic interoperability is still an issue in the
integration problem.
TheneedofintegratingGeographicInformationfromdifferentsources
?
Levels of heterogeneity
System
Syntactic
Structural
Semantic
(meaning of words)
8. motorway
trunk
freeway
Official source 1
turnpike
Proposed Approach
8
Themethodaccordingtheobjectives
“turnpike”
R2RML
mapping
R2RML
mapping
R2RML
mapping
RDF
RDF
RDF
Standard
Domain
Ontology
Reality
Common
Knowledge
“motorway”
Official source 1
OpenStreetMap
9. Quebec
• <highway=bus_stop>
St. John´s
▪ <public_transport=stop_position>
9
Morethanonetagperreal-worldphenomenon (synonymy)
Ongoingwork:SemanticHeterogeneityin OSM datasets
▪▪▪
▪▪▪
▪
▪
Number of tags per phenomenon increases with the scale, and % is important
•<highway=bus_stop> ▪<public_transport=stop_position>
11. Conclusions and future work
11
Proposed approach is based in a domain ontology, which allows:
• Matching datasets to a common pivot (R2RML allows flexible and direct
mappings)
• No need to know how to handle ontologies.
• Reusing the mappings.
Semantic Heterogeneity in OSM datasets.
• Number of tags and their % of occurrence per real-world phenomenon
• Time and spatial scale are factors affecting SH in OSM datasets.
Future work:
• Developing more the ontology.
• User-friendly interface for making R2RML mappings.
• Deeper study factors involved in SH in OSM datasets, trying to model it.