Contenu connexe Similaire à Graph Query Languages: update from LDBC (20) Plus de Juan Sequeda (20) Graph Query Languages: update from LDBC1. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Graph Query Languages
Juan F. Sequeda, Ph.D
Co-Founder
Capsenta
1Smart
Data
– Graphorum Conference
– January
20,
2017
@juansequeda @
LDBCouncil
2. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Take
away
message
• Graph
Databases
need
a
Standardized
Query
Language
• It’s
complicated
• “Those
who
fail
to
learn
from
history
are
doomed
to
repeat
it”
3. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Linked
Data
Benchmark
Council
(LDBC)
• LDBC
is
a
non-‐profit
organization
dedicated
to
establishing
benchmarks,
benchmark
practices
and
benchmark
results
for
graph
data
management
software.
• LDBC
was
established
as
an
outcome
of
the
LDBC
EU
project
funded
by
the
European
Commission
within
the
7th
Framework
Programme (Grant
Agreement
No.
317548).
http://ldbcouncil.org/
4. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
LDBC
Organization
(non-‐profit)
“sponsors”
+
non-‐profit
members
(FORTH,
STI2)
&
personal
members
+
Task
Forces,
volunteers
developing
benchmarks
+
TUC:
Technical
User
Community
(8
workshops,
~40
graph
and
RDF
user
case
studies,
18
vendor
presentations)
9th
TUC
Meeting,
SAP
Headquarters
in
Walldorf Germany,
February
9-‐10
2017
http://ldbcouncil.org/9th-‐tuc-‐meeting-‐sap-‐headquarters-‐walldorf-‐germany-‐february-‐9-‐10-‐2017
5. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
LDBC
Benchmarks
5
http://ldbcouncil.org/benchmarks
Graphalytics Semantic
Publishing Social
Network
VLDB
2016
SIGMOD
2015
6. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Graph
Query
Language
Task
Force
• Study
query
languages
for
graph
data
management
systems,
specifically
systems
storing
“Property
Graph”
data
• Query
language
should
cover
the
needs
of
important
use
cases:
social
network
benchmark,
interactive
and
BI
workloads
• Devise
a
list
of
desired
features
and
functionalities
• Renzo
Angles,
Universidad
de
Talca
• Marcelo
Arenas,
PUC
Chile
-‐ task
force
lead
• Pablo
Barceló,
Universidad
de
Chile
• Peter
Boncz,
Vrije Universiteit Amsterdam
• George
Fletcher,
Eindhoven
University
of
Technology
• Claudio
Gutierrez,
Universidad
de
Chile
• Tobias
Lindaaker,
Neo
Technology
• Marcus
Paradies,
SAP
• Raquel
Pau,
UPC
• Arnau Prat,
UPC
/
Sparsity
• Juan
Sequeda,
Capsenta
• Oskar
van
Rest,
Oracle
Labs
• Hannes
Voigt,
TU
Dresden
• YinglongXia,
IBM
7. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Property
Graph
Data
Model
id:
123
name:
Juan
Sequeda
Person
id:
456
name:
Marcelo
Arenas
Person
knows
since:
2010
• Neo4j’s
Cypher
• Oracle’s
PGQL
• Tinkerpop’s
Gremlin
8. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Graph
Query
Languages
Today
9. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Standard
Syntax
and
Semantics
• Standard
– Vendor
lock-‐in
– Prevents
true
performance
competition
à improvement
of
systems
• Syntax
– Hard
to
define
benchmarks
– Hard
to
compare
Graph
DBMS
– Applications
are
not
portable
• Semantics
– PGQL:
iso/homomorphism
– Cypher:
“edge”
isomorphism
– Gremlin:
homomorphism
– SPARQL:
homomorphism Best
Paper
WWW2012
Angles
and
Gutierrez.
The
Expressive
Power
of
SPARQL.
ISWC2008
Best
Paper
ISWC2006.
10
Year
Award
2016
10. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Closed
Language
SQL
Graph
Query
Language*
however
…
11. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Path
Support
• Conjunctive
Query
(CQ)
• Regular
Path
Queries
(RPQ):
regular
expression
over
edge
labels
• Conjunctive
Regular
Path
Query
(CRPQ)
• Union
Conjunctive
Regular
Path
Query
(UCRPQ)
• Path
Matching
supported.
What
about
further
processing
of
Paths?
• Require
for
an
(a,b)*
RPQ
that
the
sum
of
some
property
on
all
b
edges
along
the
path
is
greater
some
give
threshold
12. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Desired
Query
Functionalities
• Adjacency
Queries
• Graph
Pattern
Matching
• Navigational
Queries
• Aggregate
Queries
• Sub
queries
13. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Adjacency
queries
• Property
access
– Get
the
firstName and
lastName of
a
person
having
email
"$email”
• Neighborhood
of
a
node
– Get
the
firstName and
lastName of
the
friends
of
a
person
identified
by
email
"$email".
• K-‐neighborhood
of
a
node
– Get
the
email,
firstName and
lastName of
friends
of
the
friends
of
a
person
having
email
"$email"
(excluding
the
start
person)
(i.e.
get
a
list
of
recommended
friends)
(directed
2-‐neighborhood)
14. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Graph
Pattern
Matching
• Join
– Get
the
creationDate and
content
of
the
messages
created
by
a
person
identified
by
email
"$email1"
and
commented
by
another
person
identified
by
"$email2".
• Union
– Get
the
creationDate and
content
of
the
messages
either
created
or
liked
by
a
person
identified
by
email
"$email".
• Intersection
– Get
the
email,
firstNameand
lastName of
the
common
friends
between
two
persons
identified
by
emails
"$email1"
and
"$email2"
respectively.
• Difference
– Given
two
friends
identified
by
emails
"$email1"
and
"$email2"
respectively,
get
the
email,
firstName and
lastName of
the
friends
of
the
second
person
which
are
not
friends
of
the
first
person
(this
questions
is
relevant
for
friendship
recommendations).
• Optional
– Given
a
person
identified
by
email
"$email",
get
the
title
of
all
the
messages
created
by
such
person,
and
the
content
of
the
first
comment
replying
each
message
(if
it
exists).
• Filter
– Get
the
properties
of
the
people
whose
firstNameincludes
the
string
"xxx"
(it
implies
use
of
wildcards).
15. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Navigational
queries
• Reachability
– Is
there
a
friendship
connection
between
two
persons
identified
by
emails
"$email1"
and
"$email2"
respectively?
• All
Path
Finding
– Get
the
friendship
paths
between
two
persons
identified
by
emails
"$email1"
and
"$email2"
respectively.
• Shortest
Path
Finding
– The
shortest
friendship
path
between
two
persons
identified by
emails
"$email1"
and
"$email2"
respectively".
• Regular
Path
Query
– Get
the
firstName of
friends
of
the
friends
of
the
friends
of
a
person
identified
by
email
"$email".
• Conjunctive
Regular
Path
Queries
– Given
a
target
message
created
on
"$dateTime"
by
a
person
identified
by
email
"$email",
for
each
comment
replying
the
target
message,
get
the
comment's
content
and
the
email
of
the
comment´s
creator.
• Filtered
regular
path
query
– Given
a
person
identified
by
email
"$email",
get
the
title
of
all
the
messages
liked
by
such
person
between
"$dateTime1"
and
"$dateTime2”.
16. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Where
are
we
now?
17. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Decision:
Property
Graph
Data
Model
In
the
following
definition,
we
assume
the
existence
of
the
following
sets:
• L is
an
infinite
set
of
(node
and
edge)
labels;
• P is
an infinite
set
of
property
names;
• V is
an
infinite
set
of
literals
(actual
values).
Moreover,
we
assume
that
SET(X)
is
the
set
of
all
finite
subsets
of
a
given
set X.
Then
a
property
graph
is
a
tuple G =
(N,
E, ρ, λ, σ),
where:
• nodes: N is
a
finite
set
of
nodes;
• edges:
E is
a
finite
set
of
edges
such
that N and E have
no
elements
in
common;
ρ : E → (N × N)
is
a
total
function;
• labels:
λ :
(N
U E) →
SET(L) is
a
total
function;
• properties:
σ :
(N U E) × P → V is
a
partial
function.
18. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Property
Graph
Data
Model
id:
123
name:
Juan
Sequeda
Person
id:
456
name:
Marcelo
Arenas
Person
knows
since:
2010
L is
an
infinite
set
of
(node
and
edge)
labels
P is
an infinite
set
of
property
names
V is
an
infinite
set
of
literals
(actual
values)
N is
a
finite
set
of
nodes
E is
a
finite
set
of
edges
such
that
N and E have
no
elements
in
common
L =
{Person,
knows}
P =
{id,
name,
since}
V =
{“123”,
“456”,
“Juan
Sequeda”,
“Marcelo
Arenas”,
“2010”}
N =
{n1,
n2}
E =
{e1}
ρ (edge) : E → (N × N)
is
a
total
function
λ (labels):
(N
U E) →
SET(L) is
a
total
function
σ (properties):
(N U E) × P → V is
a
partial
function
ρ =
[ρ(e1)
=
(n1,n2) ]
λ =
[λ(n1)
=
“Person”,
λ(n2)
=
“Person”,
λ(e1)
=
“knows”
]
σ =
[
σ(n1)
=
{(“id”,
“123”),
(“name”,
“Juan
Sequeda”)},
σ(n2)
=
{(“id”,
“456”),
(“name”,
“Marcelo
Arenas”)},
σ(e1)
=
{(“since”,
“2010”)}
]
19. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Unified
Data
Model
Options
• Extend
Relational
model
(where
cells
can
contain
actual
values
such
as
strings,
integers,
dates,…)
with
objects
of
type
either
NODE,
EDGE,
PATH
or
GRAPH
– Embedding
Graphs
in
Relational
Database
– Similar
Postgres +
JSON
datatype
– What
happens
if
you
join
on
a
GRAPH
column?
20. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Unified
Data
Model
Options
• “Natural”
Graph
Data
model
20
1 2 3 4
5 6
id label E.id E.dest
1 purple 10 2
2 green 11 3
13 5
3 green 12 4
14 6
4 purple no
out edges
5 green 15 6
6 purple 16 4
10 11 12
13 14
15
16
Data
graph
G
SELECT x, y
FROM G (x:purple)-[e:*]->(y:purple)
Query
1:
id x y
90 1 4
91 1 6
x=1
y=4
90
x=1
y=6
91
id x y E.id E.dest
90 1 4 no
out edges
91 1 6 no
out edges
New
Nodes
Because
of
Relation
Projection
21. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(1)
Paths
make
things
“weird”!
SELECT x, y, CHEAPEST PATH edgeId
FROM G (x:purple)-[e:*]->(y:purple)
Query
2a:
id x y
90 1 4
91 1 6
id x y E.id E.dest E.edgeId E.order
90 1 4 20 90 10 1
21 90 11 2
22 90 12 3
91 1 6 23 91 10 1
24 91 13 2
25 91 15 3
x=1
y=4
90
x=1
y=6
91
edgeId=10
order=1
edgeId=11
order=2
edgeId=12
order=3
edgeId=10
order=1
edgeId=13
order=2
edgeId=15
order=3
1 2 3 4
5 6
10 11 12
13 14
15
16
Data
graph
G
Paths
are
Self
Edges
with
an
Order
22. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(2)
Paths
make
things
“weird”!
22
SELECT x, y, p
FROM G (x:purple)-[CHEAPEST p:+]->(y:purple)
Query
2b:
id x y
90 1 4
91 1 6
x=1
y=4
90
x=1
y=6
91id x y p E.id E.dest
90 1 4 [10,11,12] no
out edges
91 1 6 [10,13,15] no
out edges
x=1
y=4
p=[10,11,12]
90
x=1
y=6
p=[10,13,15]
91
1 2 3 4
5 6
10 11 12
13 14
15
16
Data
graph
G
Paths
are
Datatypes
23. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(1)
Graph
Projection
(instead
of
Relation
Projection)
23
Query
3a:
1 2 3 4
5 6
10 11 12
13 14
15
16
Data
graph
G
SELECT (x), (y)
FROM G (x:orange)-[:+]->(y:orange)
id label E.id E.dest
1 purple no
out edges
4 purple no
out edges
6 purple no
out edges
1 4
6
id x y
90 1 4
91 1 6
x=1
y=4
90
x=1
y=6
91
id x y E.id E.dest
90 1 4 no
out edges
91 1 6 no
out edges
Remember:
this
is
the
Relational
Projection
24. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(2)
Graph
Projection
(instead
of
Relation
Projection)
24
id label E.id E.dest
1 purple 21 4
22 6
4 purple 23 6
6 purple no
out edges
Query
3b:
1 4
6
21
2322
SELECT (x)--(y)
FROM G (x:purple)-[:+]->(y:purple)
1 2 3 4
5 6
10 11 12
13 14
15
16
Data
graph
G
25. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(1)
Graph
Projection
with
Paths
25
1 2 3 4
5 6
10 11 12
13 14
15
16
Data
graph
G
SELECT (PATH p)
FROM G (x:orange)-[CHEAPEST p:*]->(y:orange)
WHERE y.id = 4
1 2 3 4
6
10 11 12
16id label E.id E.dest
1 purple 10 2
2 green 11 3
3 green 12 4
4 purple no
out edges
6 purple 16 6
Query
4a:
26. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(2)
Graph
Projection
with
Paths
26
1 2 3 4
5 6
10 11 12
13 14
15
16
Data
graph
G
Query
4b:
21
23l=1
L=3
SELECT (x)-[:{l=LENGTH OF p}]-(y)
FROM G (x:orange)-[CHEAPEST p:*]->(y:orange)
WHERE y.id = 4
id label E.id E.dest E.l
1 purple 21 4 3
4 purple 23 6 1
6 purple no
out edges
1 4
6
27. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Wrap
up
• We
want
to
hear
from
you!
• 9th
LDBC
Technical
User
Community
Meeting,
SAP
Headquarters
in
Walldorf
Germany,
February
9-‐10
2017
• http://ldbcouncil.org/9th-‐tuc-‐meeting-‐sap-‐
headquarters-‐walldorf-‐germany-‐february-‐9-‐
10-‐2017
27
28. Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
THANK
YOU
Juan
Sequeda,
Ph.D
Co-‐Founder
– Capsenta
juan@capsenta.com
@juansequeda
28
Sequeda
J.
Integrating
Relational
Databases
with
the
Semantic
Web.
IOS
Press.
2016
http://www.iospress.nl/book/integrating-‐relational-‐databases-‐with-‐the-‐semantic-‐web/