1. May 14, 2014
RDF Analytics
Lenses over Semantic Graphs
Dario Colazzo 3,1
Franc¸ois Goasdou´e 4,1
Ioana Manolescu 1,2
Alexandra Roatis¸ 2,1
1OAK – Inria, France
2LRI – Universit´e Paris-Sud, France
3LAMSADE – Universit´e Paris Dauphine, France
4PILGRIM – Universit´e Rennes 1, France
2. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
3. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
4. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
5. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
6. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
7. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
8. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
9. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
10. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
11. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT company
builds user applications
open RDF data (Grenoble)
worksFor
DS: Restaurants
(i) heterogeneous data
App: clickable map m
#restaurants
region & average rating
type of cuisine
build
RDW: relational data warehouse
extract tabular data (SPARQL queries)
merge
(ii) new central concepts
DS3: MuseumsDS2: Shops
RDW2 RDW3
(iii) other missing relationships?
Bug: landmarks museums
find
redesign
Feature: query relationships
region famous people
(iv) query schema
add
Feature: new type of aggregation
for each landmark, show how many restaurants are nearby
(v) impossible ! (separate star schema; restaurants and landmarks – central entities)
add
12. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 3
RDF data warehousing
Application needs:
(i) support of heterogeneous data
(ii) multiple central concepts
(iii) support for RDF semantics when querying
(iv) possibility to query the relationships between entities (the schema)
(v) flexible choice of aggregation dimensions
This work:
redesign the core data analytics concepts and tools for RDF
formal framework for warehouse-style analytics on RDF data
suited to heterogeneous, semantic-rich corpora of Linked Data
13. Summary
1. RDF Graphs & BGP Queries
2. RDF Graph Analysis
3. On-Line Analytical Processing
4. Empirical Evaluation
5. Sum Up
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 4
14. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 5
RDF Graphs & BGP Queries
– recall –
15. The Resource Description Framework (RDF)
RDF graph – set of triples
Assertion Triple Relational notation
Class s rdf:type o o(s)
Property s p o p(s, o)
user1
user2
worksWith
Bill hasName
28 hasAge
Madrid
inCity
Studentrdf:type
:b1wrote
blog1
inBlog
resource (URI)
blank node
literal (string)
property
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 6
16. RDF Schema (RDFS)
– declare semantic constraints between classes and properties
Constraint Triple Relational notation
Subclass s rdfs:subClassOf o s ⊆ o
Subproperty s rdfs:subPropertyOf o s ⊆ o
Domain typing s rdfs:domain o Πdomain(s) ⊆ o
Range typing s rdfs:range o Πrange(s) ⊆ o
Person
Student
rdfs:subClassOf
knows
rdfs:range
rdfs:domain
worksWith
rdfs:subPropertyOf
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 7
17. Open-world assumption and RDF entailment
RDF data model – based on the open-world assumption.
→ deductive constraints – implicitly propagate tuples
Entailment – reasoning mechanism
set of explicit triples
+ → derive implicit triples
some entailment rules
Exhaustive application of entailment → saturation (closure)
The semantics of an RDF graph is its saturation.
user1 Student
Person
rdfs:subClassOf
rdf:type
rdf:type
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 8
18. Basic Graph Pattern (BGP) queries
→ subset of SPARQL; BGP – conjunctions of triple patterns
q(y) :- x rdf:type Person, x hasName y
query evaluation query answering
the evaluation of a query only uses the graph’s explicit triples
(complete) answer set – evaluate q against the graph’s saturation
user1 Student
Person
rdfs:subClassOf
rdf:type
rdf:type
Bill
hasName
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 9
19. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 10
RDF Graph Analysis
– formal framework for warehousing RDF data –
20. Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wrote
blog1
inBlog
inBlog
Code Blog
hasName
Analytical schema:
→ labeled directed graph
Instance of the analytical schema w.r.t. the graph
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
21. Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wrote
blog1
inBlog
inBlog
Code Blog
hasName
Analytical schema:
→ labeled directed graph
n1
λ(n1) ← Blogger
δ(n1) ←
q(x) :- x rdf:type Person,
x wrote y,
y inBlog z
n2
λ(n2) ← Name
δ(n2) ← q(x) :- y hasName x
e2
λ(e2) ← identifiedBy
δ(e2) ←
q(x, y) :- x rdf:type Person,
x hasName y
Instance of the analytical schema w.r.t. the graph
x rdf:type λ(n1)
user1 rdf:type Blogger
user2 rdf:type Blogger
x λ(e2) y
user1 identifiedBy Bill
x rdf:type λ(n2)
Bill rdf:type Name
Code Blog rdf:type Name
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
22. Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wrote
blog1
inBlog
inBlog
Code Blog
hasName
Analytical schema:
→ labeled directed graph
n1
λ(n1) ← Blogger
δ(n1) ←
q(x) :- x rdf:type Person,
x wrote y,
y inBlog z
n2
λ(n2) ← Name
δ(n2) ← q(x) :- y hasName x
e2
λ(e2) ← identifiedBy
δ(e2) ←
q(x, y) :- x rdf:type Person,
x hasName y
Instance of the analytical schema w.r.t. the graph
x rdf:type λ(n1)
user1 rdf:type Blogger
user2 rdf:type Blogger
x λ(e2) y
user1 identifiedBy Bill
x rdf:type λ(n2)
Bill rdf:type Name
Code Blog rdf:type Name
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
23. Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wrote
blog1
inBlog
inBlog
Code Blog
hasName
Analytical schema:
→ labeled directed graph
n1
λ(n1) ← Blogger
δ(n1) ←
q(x) :- x rdf:type Person,
x wrote y,
y inBlog z
n2
λ(n2) ← Name
δ(n2) ← q(x) :- y hasName x
e2
λ(e2) ← identifiedBy
δ(e2) ←
q(x, y) :- x rdf:type Person,
x hasName y
Instance of the analytical schema w.r.t. the graph
x rdf:type λ(n1)
user1 rdf:type Blogger
user2 rdf:type Blogger
x λ(e2) y
user1 identifiedBy Bill
x rdf:type λ(n2)
Bill rdf:type Name
Code Blog rdf:type Name
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
24. Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wrote
blog1
inBlog
inBlog
Code Blog
hasName
Analytical schema:
→ labeled directed graph
n1
λ(n1) ← Blogger
δ(n1) ←
q(x) :- x rdf:type Person,
x wrote y,
y inBlog z
n2
λ(n2) ← Name
δ(n2) ← q(x) :- y hasName x
e2
λ(e2) ← identifiedBy
δ(e2) ←
q(x, y) :- x rdf:type Person,
x hasName y
! data heterogeneity preserved !
Instance of the analytical schema w.r.t. the graph
x rdf:type λ(n1)
user1 rdf:type Blogger
user2 rdf:type Blogger
x λ(e2) y
user1 identifiedBy Bill
x rdf:type λ(n2)
Bill rdf:type Name
Code Blog rdf:type Name
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
25. Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wrote
blog1
inBlog
inBlog
Code Blog
hasName
Analytical schema:
→ labeled directed graph
n1
λ(n1) ← Blogger
δ(n1) ←
q(x) :- x rdf:type Person,
x wrote y,
y inBlog z
n2
λ(n2) ← Name
δ(n2) ← q(x) :- y hasName x
e2
λ(e2) ← identifiedBy
δ(e2) ←
q(x, y) :- x rdf:type Person,
x hasName y
! easy to extend !
Instance of the analytical schema w.r.t. the graph
x rdf:type λ(n1)
user1 rdf:type Blogger
user2 rdf:type Blogger
x λ(e2) y
user1 identifiedBy Bill
x rdf:type λ(n2)
Bill rdf:type Name
Code Blog rdf:type Name
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
26. Analytical query (AnQ)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12
Analytical schema: Instance:
n1 : Blogger n2 : Citye2 : from
n3 : Value
e3 : age
n4 : BlogPost
e4 : posted
n5 : Site e5 : on
user1
user2
user3
28 age
Madrid from
40 age
35 age
New York from
post1
post2
post3
post4
posted
posted
posted
posted
blog1
blog2
on
on
on
on
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
m(x, v) :- x posted y, y on v
count
27. Analytical query (AnQ)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12
Analytical schema: Instance:
n1 : Blogger n2 : Citye2 : from
n3 : Value
e3 : age
n4 : BlogPost
e4 : posted
n5 : Site e5 : on
user1
user2
user3
28 age
Madrid from
40 age
35 age
New York from
post1
post2
post3
post4
posted
posted
posted
posted
blog1
blog2
on
on
on
on
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
{ user1, “28”, “Madrid” , user3, “35”, “New York” }
m(x, v) :- x posted y, y on v
count
28. Analytical query (AnQ)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12
Analytical schema: Instance:
n1 : Blogger n2 : Citye2 : from
n3 : Value
e3 : age
n4 : BlogPost
e4 : posted
n5 : Site e5 : on
user1
user2
user3
28 age
Madrid from
40 age
35 age
New York from
post1
post2
post3
post4
posted
posted
posted
posted
blog1
blog2
on
on
on
on
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
{ user1, “28”, “Madrid” , user3, “35”, “New York” }
m(x, v) :- x posted y, y on v
{ user1, blog1 , user1, blog2 , user2, blog2 , user3, blog2 }
count
29. Analytical query (AnQ)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12
Analytical schema: Instance:
n1 : Blogger n2 : Citye2 : from
n3 : Value
e3 : age
n4 : BlogPost
e4 : posted
n5 : Site e5 : on
user1
user2
user3
28 age
Madrid from
40 age
35 age
New York from
post1
post2
post3
post4
posted
posted
posted
posted
blog1
blog2
on
on
on
on
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
{ user1, “28”, “Madrid” , user3, “35”, “New York” }
m(x, v) :- x posted y, y on v
{ user1, blog1 , user1, blog2 , user2, blog2 , user3, blog2 }
count
{ “28”, “Madrid”, 2 , “35”, “New York”, 1 }
30. Analytical query answering
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13
through analytical schema materialization
through analytical query reformulation
31. Analytical query answering
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13
through analytical schema materialization
through analytical query reformulation
Analytical schema:
n1
λ(n1) ← Blogger
δ(n1) ←
q(x) :- x rdf:type Person,
x wrote y,
y inBlog z
e1
λ(e1) ← acquaintedWith
δ(e1) ←
q(x, y) :- z rdfs:subPropertyOf knows,
x z y
Query:
c(x, d) :- x rdf:type Blogger,
x acquaintedWith d
c (x, d) :- x rdf:type Person,
x wrote y1,
y1 inBlog y2,
z1 rdfs:subPropertyOf knows,
x z1 d
32. Analytical query answering
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13
through analytical schema materialization
through analytical query reformulation
Analytical schema:
n1
λ(n1) ← Blogger
δ(n1) ←
q(x) :- x rdf:type Person,
x wrote y,
y inBlog z
e1
λ(e1) ← acquaintedWith
δ(e1) ←
q(x, y) :- z rdfs:subPropertyOf knows,
x z y
Query:
c(x, d) :- x rdf:type Blogger,
x acquaintedWith d
c (x, d) :- x rdf:type Person,
x wrote y1,
y1 inBlog y2,
33. Analytical query answering
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13
through analytical schema materialization
through analytical query reformulation
Analytical schema:
n1
λ(n1) ← Blogger
δ(n1) ←
q(x) :- x rdf:type Person,
x wrote y,
y inBlog z
e1
λ(e1) ← acquaintedWith
δ(e1) ←
q(x, y) :- z rdfs:subPropertyOf knows,
x z y
Query:
c(x, d) :- x rdf:type Blogger,
x acquaintedWith d
c (x, d) :- x rdf:type Person,
x wrote y1,
y1 inBlog y2,
z1 rdfs:subPropertyOf knows,
x z1 d
34. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 14
On-Line Analytical Processing
– applying OLAP operations –
35. Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
m(x, v) :- x posted y, y on v
count
36. Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
m(x, v) :- x posted y, y on v
count
Slice: bind an aggregation dimension to a single value
cΣ (x, d1, d2) :- x age d1, x from d2
Σ = { d1 ← “35” }
37. Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
m(x, v) :- x posted y, y on v
count
Slice: bind an aggregation dimension to a single value
cΣ (x, d1, d2) :- x age d1, x from d2
Σ = { d1 ← “35” }
Dice: bind several aggregation dimensions to sets of values
cΣ (x, d1, d2) :- x age d1, x from d2
Σ = { d1 ← {“28”}, d2 ← {“Madrid”, “Kyoto”} }
38. Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
m(x, v) :- x posted y, y on v
count
Slice: bind an aggregation dimension to a single value
cΣ (x, d1, d2) :- x age d1, x from d2
Σ = { d1 ← “35” }
Dice: bind several aggregation dimensions to sets of values
cΣ (x, d1, d2) :- x age d1, x from d2
Σ = { d1 ← {“28”}, d2 ← {“Madrid”, “Kyoto”} }
Drill-in: remove a dimension from the classifier
c (x, d2) :- x from d2
39. Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
m(x, v) :- x posted y, y on v
count
Slice: bind an aggregation dimension to a single value
cΣ (x, d1, d2) :- x age d1, x from d2
Σ = { d1 ← “35” }
Dice: bind several aggregation dimensions to sets of values
cΣ (x, d1, d2) :- x age d1, x from d2
Σ = { d1 ← {“28”}, d2 ← {“Madrid”, “Kyoto”} }
Drill-in: remove a dimension from the classifier
c (x, d2) :- x from d2
Drill-out: add a dimension to the classifier
c (x, d1, d2, d3) :- x age d1, x from d2, x acquaintedWith d3
40. Roll-up and drill-down
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 16
Query: Find the number of sites where each blogger posts,
classified by the blogger’s age and city.
c(x, d1, d2) :- x age d1, x from d2
m(x, v) :- x posted y, y on v
count
nextLevel relationship – hierarchies among nodes or edges
n1 : Blogger n2 : Citye2 : from n6 : Statee6 : nextLevel
n3 : Value
e3 : age
n4 : BlogPost
e4 : posted
n5 : Site e5 : on
Roll-up: along the City dimension to the State level
c (x, d1, d3) :- x age d1, x from d2, d2 nextLevel d3
41. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 17
Empirical Evaluation
– experiments and demo –
42. Experiments
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 18
Settings: kdb+ v3.0 (64 bits) – highly efficient in-memory column store
q interpreted programming language
Dataset: DBpedia Download 3.8
Ontology and Ontology Infobox datasets
Hardware: 8-core DELL server at 2.13 GHz
16 GB of RAM
running Linux 2.6.31.14
Results: linear scale-up w.r.t. the data size
for instance materialization and query answering
43. Analytical query answering
12 patterns c number of triple patterns in the classifier query
1,097 queries v number of dimension variables in the classifier query
m number of triple patterns in the measure query
c1v1m1
c1v1m2
c1v1m3
c2v1m3
c3v2m3
c4v3m3
c5v1m3
c5v2m3
c5v3m3
c5v4m1
c5v4m2
c5v4m3
0
1
10
average minimum maximum
c1v1m1
(73)
c1v1m2
(53)
c1v1m3
(62)
c2v1m3
(71)
c3v2m3
(76)
c4v3m3
(130)
c5v1m3
(144)
c5v2m3
(216)
c5v3m3
(144)
c5v4m1
(28)
c5v4m2
(64)
c5v4m3
(36)
0
1
10
100
1,000
10,000
100,000
evaluation time (s)
number of results
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 19
44. Java GUI using the Prefuse toolkit
(collaboration with Tushar Ghosh)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 20
45. Java GUI using the Prefuse toolkit
(collaboration with Tushar Ghosh)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 20
46. Java GUI using the Prefuse toolkit
(collaboration with Tushar Ghosh)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 20
48. Related works
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 22
Graph cube: on warehousing and OLAP multidimensional networks [SIGMOD 2011]
→ do not handle heterogeneous graphs, nor data semantics, both central in RDF
→ only focus on counting edges in contrast with our flexible analytical queries
Business intelligence on complex graph data [EDBT/ICDT 2012 Workshops]
→ graph data aggregated in a spatial fashion (group connected nodes into regions)
→ our framework – RDF-specific + more general aggregation
No Size Fits All – Running the Star Schema Benchmark with SPARQL and
RDF Aggregate Views [ESWC 2013]
→ techniques for transforming OLAP queries into SPARQL
→ could be used to further optimize analytical query answering in our framework
The MD-join: An Operator for Complex OLAP [ICDE 2001]
→ separation between grouping and aggregation present in our analytical queries
is similar to the MD-join operator for RDWs
W3C’s SPARQL 1.1 Query Language
→ features SQL-style grouping and aggregation
→ efficient SPARQL 1.1 platforms – ideal for deploying our framework
49. Sum up and perspectives
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 23
Sum up:
Approach for specifying and exploiting an RDF data warehouse
define an analytical schema that captures the information of
interest
formalize analytical queries (or cubes) over the analytical schema
Instances of analytical schemas are RDF graphs themselves, which
allows to exploit the rich semantics and heterogeneous structure.
Perspectives:
semi-automatic analytical schema design
optimized OLAP operation on analytical queries results
efficient methods for deploying analytical schemas and analytical
queries in parallel contexts