Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and Mapping Knowledge

Ontology-Based Data Access Mapping Generation
via Data, Schema, Query, and Mapping Knowledge
Pieter Heyvaert
pheyvaer.heyvaert@ugent.be

Semantic Web technologies rely on Linked Data
querying
visualizations
publishing

But not all data is accessible as Linked Data
databases
XML files
JSON files

Solutions to provide access exist
manual: completely done by the user
semi-automatic: users provide feedback
automatic: no user interaction required

But they have limitations
limited to specific use cases
limited support for complex use cases

PhD’s goal: improve access to Linked Data

Overview
problem
current solutions
research questions
hypotheses
research methodology & approach
preliminary results
evaluation plan

How do we provide access?
non-Linked
Data
Linked
Data
?

How do we provide access?
non-Linked
Data
Linked
Data
?
id name genre
0 J.K. Rowling fiction
1 George Orwell non-fiction
table: authors

Apply mappings on non-Linked Data
non-Linked
Data
Linked
Data
mapping
mapping: rules to generate RDF terms and triples using data and ontologies

non-Linked Data Linked Datamapping
id name genre
table: authors
rule: create url from id
rule: name is value for ex:fullname
rule: if genre is ‘fiction’
class is ex:FictionAuthor
else
class is ex:NonFictionAuthor

non-Linked Data Linked Datamapping
id name genre
table: authors
ex:0 a ex:FictionAuthor .
ex:0 ex:fullname ‘J.K. Rowling’ .
ex:1 a ex:NonFictionAuthor .
ex:1 ex:fullname ‘George Orwell’ .

Mappings need to be created
from scratch (single-scenario use case)
mapping A
by reusing previous mappings (multi-scenario use case)
mapping B mapping C
mapping

(Semi-)automatic methods are preferred
mapping
manual
(semi-)automatic

Still a number of challenges left
dealing complex data (schemas)
not all techniques work on single-scenario use cases

Dealing with complex data (schemas)
e.g., when the class of an entity does not depend on the table, but on a value
rule: if genre is ‘fiction’,
class is ex:FictionAuthor
else
class is ex:NonFictionAuthor
id name genre
table: authors

Not all techniques work on single-scenario use cases
scenario A scenario Bmulti
single
because they rely on readily-available previous mappings
mapping
results in reuse
? scenario B?
results in reuse

Current solutions
What knowledge is used?
How is this knowledge used?
What knowledge is not used?

What do current solutions use?
knowledge from the mapping process
existing knowledge outside the mapping process

Knowledge from mapping process is used
data
data schema
ontologies
not all elements are required

Existing knowledge is used
data
data schemas
mappings
ontologies
Linked Data
not all elements are required

How is all this knowledge used?
data schema + existing ontology
data + existing mapping

Data schema + existing ontology
data schema
new ontology
1

data schema
existing ontologynew ontology match
1
2 2

data schema
existing ontologynew ontology match
mapping
1
2 2
3

Data + existing mapping
data
classesproperties
1

data existing mapping
classesproperties classespropertiesmodel
1
2 2
2

data existing mapping
classes
mapping
properties classespropertiesmodel
1
2 2
2
3
3 3

These methods are not combined
only a single method is used
combining multiple methods has not been explored

What knowledge do current solutions not use?
not all knowledge from previous mappings
neglect query workload

Not all knowledge from previous mappings is used
data transformations
to lowercase
substring
conditions: if-else rules

Query workload is neglected
queries to be executed on the non-existing Linked Dataset
queries contains knowledge
model
used ontologies
annotations

select * where {
?s a ex:FictionAuthor .
?s ex:fullname ?n .
}
id name genre
table: authors
ontology to use: http://example.com
model + annotations: ex:FictionAuthor
ex:fullname
How can we use queries?

Research questions
discover existing knowledge
use discovered knowledge

Question 1: how can we discover
existing knowledge that is relevant?
?mappings
ontologies
(Linked) Data
query workload
data schema
existing
mapping

Question 2: how can we use the discovered knowledge
to generate a new mapping?
mapping
mappings
ontologies
(Linked) Data
query workload
data
data schema
ontologies
query workload
data schema
existing mapping process

Overview
problem statement
research questions
hypotheses
research methodology & approach
preliminary results
evaluation plan

Hypotheses
improve quality
decrease task complexity

Hypothesis 1: using existing knowledge improves
the quality of a new single-scenario mapping.
quality → fitness for use

Hypothesis 2: using existing knowledge
decreases the task complexity of the mapping process.
Lui and Li developed model to measure task complexity.
5 characteristics that influence the task’s performance

Task complexity has 5 characteristics
input: e.g., data, ontologies, user feedback
output: Linked Data, mapping
process: steps, user actions
duration: time to complete task
presentation: user interface

Two aspects need to be tackled
use knowledge
both can be tackled separately

Discover existing knowledge
infer knowledge from mapping process where possible
find relevant other existing knowledge via similarity metrics

Infer knowledge from mapping process
e.g., infer data schema from data
e.g., infer ontology from queries

Infer data schema from data
id name genre
table: authors
table: authors
columns: id, name, genre
id: index, integer
name: string
genre: string (‘fiction’ or ‘non-fiction’)

Infer ontology from queries
select * where {
?s a ex:FictionAuthor .
?s ex:fullname ?n .
}
http://example.com

Find relevant existing knowledge via similarity metrics
mapping process
mapping
1. determine similarity
2. consider in mapping process
existing
table: authors
columns: id, name, genre
id: index, integer, unique
name: string
genre: string (‘fiction’ or
‘non-fiction’)
table: author
columns: id, fullname,
genres
id: index, integer
fullname: string
genres: string

Similarity metrics on different/combination of elements
metrics on data schema, ontologies, data, and query workload
PhD:
Which metrics do we use?
How do we combine the different metrics?

Two aspects need to be tackled
use knowledge

Use knowledge
work with existing methods, e.g.:
data schema + existing ontology
data + existing mappings
PhD:
how do we include new knowledge?
how do we combine these methods?

Preliminary Results
RMLEditor
RMLWorkbench
mapping generation approaches
hierarchical data analysis

RMLEditor eases the creation of mappings
GUI so domain experts can create mappings
users can view the data, mappings, and RDF triples
usable by both non-SW and SW experts
PhD: present mappings to get feedback during mapping process

RMLWorkbench eases generation and publication
graphical user interface so domain experts can administer
Linked Data generation
publication workflow
PhD: manage elements of the mapping generation process

Identified mapping generation approaches
data-driven
schema-driven
model-driven
result-driven
PhD:
provides insights on how users work
this can be applied when developing an (semi-)automatic approach

Developed tool for data analysis on hierarchical data
efficient discovery of unique identifiers in hierarchical data
PhD: to infer knowledge within the mapping process

Evaluation Plan
mapping quality
task complexity

Evaluate mapping quality
existing benchmark RODI
great for tabular data
no support for other formats, such as hierarchical data formats

Evaluate task complexity via 5 characteristics
input: e.g., data, ontologies, user feedback
output: Linked Data, mapping
process: steps, user actions
duration: time to complete task
presentation: user interface

Limited in current evaluations to single aspect
only duration
only number of user actions
only precision and recall

Roundup
improve single-scenario mappings by discovering and using existing knowledge
What similarity metrics we use for discovery?
How do we use and combine
the different methods and knowledge?

Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and Mapping Knowledge

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and Mapping Knowledge

Similaire à Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and Mapping Knowledge (20)

Plus de Pieter Heyvaert

Plus de Pieter Heyvaert (7)

Dernier

Dernier (20)

Ontology-Based Data Access Mapping Generation using Data, Schema, Query, and Mapping Knowledge