Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.
1. RDBMS
to
Graphs
Harnessing
the
Power
of
the
Graph
September
2015
Ryan
Boyd
@ryguyrg
2. Agenda
• Origins
of
Neo4j
• Benefits
of
Graphs
• Designing
your
Graph
Model
• Query
<me!
• Fi@ng
Neo4j
into
your
Enterprise
Architecture
• Q&A
3. Neo
Technology
Overview
Product
• Neo4j
-‐
World’s
leading
graph
database
• 150+
enterprise
subscrip<on
customers
including
over
50
of
the
Global
2000
Company
• Neo
Technology,
Creator
of
Neo4j
• 100
employees
with
HQ
in
Silicon
Valley,
London,
Munich,
Paris
and
Malmö
• $45M
in
funding
4. Neo4j
AdopDon
by
Selected
VerDcals
Financial
Services
Communications
Health &
Life Sciences
HR &
Recruiting
Media &
Publishing
Social
Web
Industry
& Logistics
Entertainment
Consumer Retail
Information Services
Business Services
5. How
Customers
Use
Neo4j
Network &
Data Center
Master Data
Management
Social
Recom–
mendations
Identity
& Access
Search &
Discovery
GEO
6. “Forrester
es<mates
that
over
25%
of
enterprises
will
be
using
graph
databases
by
2017”
Neo4j
Leads
the
Graph
Database
RevoluDon
“Neo4j
is
the
current
market
leader
in
graph
databases.”
“Graph
analysis
is
possibly
the
single
most
effecDve
compeDDve
differenDator
for
organiza<ons
pursuing
data-‐driven
opera<ons
and
decisions
aaer
the
design
of
data
capture.”
IT
Market
Clock
for
Database
Management
Systems,
2014
hbps://www.gartner.com/doc/2852717/it-‐market-‐clock-‐database-‐management
TechRadar™:
Enterprise
DBMS,
Q1
2014
hbp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-‐/E-‐RES106801
Graph
Databases
–
and
Their
PotenDal
to
Transform
How
We
Capture
Interdependencies
(Enterprise
Management
Associates)
hbp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-‐databasesand-‐poten<al-‐transform-‐capture-‐interdependencies/
7. High
Business
Value
in
Data
RelaDonships
Data
is
increasing
in
volume…
• New
digital
processes
• More
online
transac<ons
• New
social
networks
• More
devices
Using
Data
RelaDonships
unlocks
value
• Real-‐<me
recommenda<ons
• Fraud
detec<on
• Master
data
management
• Network
and
IT
opera<ons
• Iden<ty
and
access
management
• Graph-‐based
search
…
and
is
ge[ng
more
connected
Customers,
products,
processes,
devices
interact
and
relate
to
each
other
Early
adopters
became
industry
leaders
8. RelaDonal
DBs
Can’t
Handle
RelaDonships
Well
• Cannot
model
or
store
data
and
rela>onships
without
complexity
• Performance
degrades
with
number
and
levels
of
rela<onships,
and
database
size
• Query
complexity
grows
with
need
for
JOINs
• Adding
new
types
of
data
and
rela>onships
requires
schema
redesign,
increasing
<me
to
market
…
making
tradi<onal
databases
inappropriate
when
data
rela<onships
are
valuable
in
real-‐Dme
Slow
development
Poor
performance
Low
scalability
Hard
to
maintain
11. CAR
name:
“Dan”
born:
May
29,
1970
twiber:
“@dan”
name:
“Ann”
born:
Dec
5,
1975
since:
Jan
10,
2011
brand:
“Volvo”
model:
“V70”
Property
Graph
Model
Components
Nodes
• The
objects
in
the
graph
• Can
have
name-‐value
proper&es
• Can
be
labeled
RelaDonships
• Relate
nodes
by
type
and
direc<on
• Can
have
name-‐value
proper&es
LOVES
LOVES
LIVES
WITH
PERSON
PERSON
12. RelaDonal
Versus
Graph
Models
RelaDonal
Model
Graph
Model
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person
Friend
Person-‐Friend
ANDREAS
DELIA
TOBIAS
MICA
29. Basic
Query:
Who
do
people
report
to?
MATCH
(:Employee{
firstName:“Steven”}
)
-‐[:REPORTS_TO]-‐>
(:Employee{
firstName:“Andrew”}
)
REPORTS_TO
Steven
Andrew
LABEL
PROPERTY
NODE
NODE
LABEL
PROPERTY
30. Basic
Query:
Who
do
people
report
to?
MATCH !
(e:Employee)<-[:REPORTS_TO]-(sub:Employee)!
RETURN !
*!
33. Real
Query
from
a
Customer
Find
all
direct
reports
and
how
many
people
they
manage,
each
up
to
3
levels
down
34. (SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.pid AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT manager.pid AS directReportees, count(manager.directly_manages) AS
count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS
count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
UNION
SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages)
AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
FROM (
SELECT manager.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
UNION
SELECT reportee.pid AS directReportees,
count(reportee.directly_manages) AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN
GROUP BY directReportees
UNION
SELECT depth1Reportees.pid AS directReportees,
count(depth2Reportees.directly_manages) AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT T.directReportees AS directReportees, sum(T.count) AS count
OUTER UNIONS
FROM(
SELECT reportee.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee reportee
ON manager.directly_manages = reportee.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lNam
GROUP BY directReportees
count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
ON L1Reportees.directly_manages = L2Reportees.pid
WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lN
GROUP BY directReportees
) AS T
GROUP BY directReportees)
UNION
(SELECT L2Reportees.directly_manages AS directReportees, 0 AS count
FROM person_reportee manager
JOIN person_reportee L1Reportees
ON manager.directly_manages = L1Reportees.pid
JOIN person_reportee L2Reportees
35. Real
Query
from
a
Customer
MATCH
(manager)-‐[:REPORTS_TO*0..3]-‐>(boss),
(report)-‐[:REPORTS_TO*1..3]-‐>(manager)
WHERE
boss.name
=
“John
Doe”
RETURN
manager.name
AS
Manager,
count(report)
AS
TotalReports
Find
all
direct
reports
and
how
many
people
they
manage,
up
to
3
levels
down
Cypher
Query
36. Real
Query
from
a
Customer
Find
all
direct
reports
and
how
many
people
they
manage,
up
to
3
levels
down
Cypher
Query
SQL
Query
MATCH
(manager)-‐[:REPORTS_TO*0..3]-‐>(boss),
(report)-‐[:REPORTS_TO*1..3]-‐>(manager)
WHERE
boss.name
=
“John
Doe”
RETURN
manager.name
AS
Manager,
count(report)
AS
TotalReports
37. MATCH
(sub)-‐[:REPORTS_TO*0..3]-‐>(boss),
(report)-‐[:REPORTS_TO*1..3]-‐>(sub)
WHERE
boss.name
=
“John
Doe”
RETURN
sub.name
AS
Subordinate,
count(report)
AS
Total
Express
Complex
Queries
Easily
with
Cypher
Find
all
direct
reports
and
how
many
people
they
manage,
up
to
3
levels
down
Cypher
Query
SQL
Query
38. “We
found
Neo4j
to
be
literally
thousands
of
Dmes
faster
than
our
prior
MySQL
solu<on,
with
queries
that
require
10
to
100
Dmes
less
code.
Today,
Neo4j
provides
eBay
with
func<onality
that
was
previously
impossible.”
Volker
Pacher
Senior
Developer
39. Who
is
in
Robert’s
(direct,
upwards)
reporDng
chain?
MATCH !
p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)!
WHERE!
sub.firstName = ‘Robert’!
RETURN !
p!
40. Who
is
in
Robert’s
(direct,
upwards)
reporDng
chain?
41. Who’s
the
Big
Boss?
MATCH !
p=(e:Employee)!
WHERE!
NOT (e)-[:REPORTS_TO]->()!
RETURN !
e.firstName as bigBoss!
46. Cypher
vs
SQL
-‐
Paths
MATCH (u:User)-[:KNOWS*5..5]->(f5)
WHERE u.name = 'John'
RETURN count(f5) as size;
Cypher
Find
Size
of
John’s
5th
degree
Network
● 100k
Users
● 5M
Rela<onships
● Query
took
5
min,
30s
● Returns
count
of
312M
Neo4j
config:
page-‐cache
=
512m
heap
=
4G
47. Cypher
vs
SQL
-‐
Paths
SELECT count(*)
FROM
user,
user_friend as uf1,
user_friend as uf2,
user_friend as uf3,
user_friend as uf4,
user_friend as uf5
user as f5
WHERE
user.name='John' AND
user.id = uf1.user_1 AND
uf1.user_2 = uf2.user_1 AND
uf2.user_2 = uf3.user_1 AND
uf3.user_2 = uf4.user_1 AND
uf4.user_2 = uf5.user_1 AND
uf5.user_2 = f5.id;
SQL
Find
Size
of
John’s
5th
degree
Network
● 100k
Users
● 5M
Connec<ons
● Query
took
1hr
55
mins
● Returns
312M
MySQL
config:
key_buffer
=
2G
join_buffer_size
=
2G
48. Cypher
vs
SQL
-‐
Paths
SELECT count(*)
FROM
user,
user_friend as uf1,
user_friend as uf2,
user_friend as uf3,
user_friend as uf4,
user_friend as uf5
WHERE
user.name='John' AND
user.id = uf1.user_1 AND
uf1.user_2 = uf2.user_1 AND
uf2.user_2 = uf3.user_1 AND
uf3.user_2 = uf4.user_1 AND
uf4.user_2 = uf5.user_1;
SQL
Op>mize:
Only
count
on
JOIN
table
● 100k
Users
● 5M
Connec<ons
● Query
took
2
min,
30s
● Returns
count
of
312M
MySQL
config:
key_buffer
=
2G
join_buffer_size
=
2G
49. Cypher
vs
SQL
-‐
Paths
MATCH (u:User)-[:KNOWS*4..4]->(f4)
WHERE u.name = 'John'
RETURN sum(size((f4)-[:KNOWS]->()))
Cypher
Op>mize:
Only
sum
degree
of
last
step
● 100k
Users
● 5M
Rela<onships
● Query
takes
12
sec
● Returns
count
of
312M
Neo4j
config:
page-‐cache
=
512m
heap
=
4G
50. Neo4j
Clustering
Architecture
OpDmized
for
Speed
&
Availability
at
Scale
50
Performance
Benefits
• No
network
hops
within
queries
• Real-‐>me
opera>ons
with
fast
and
consistent
response
<mes
• Cache
sharding
spreads
cache
across
cluster
for
very
large
graphs
Clustering
Features
• Master-‐slave
replica<on
with
master
re-‐elec>on
and
failover
• Each
instance
has
its
own
local
cache
• Horizontal
scaling
&
disaster
recovery
Load
Balancer
Neo4j
Neo4j
Neo4j
51. Ge[ng
Data
into
Neo4j
Cypher-‐Based
“LOAD
CSV”
Capability
• Transac<onal
(ACID)
writes
• Ini<al
and
incremental
loads
of
up
to
10
million
nodes
and
rela<onships
Command-‐Line
Bulk
Loader
neo4j-‐import
• For
ini<al
database
popula<on
• For
loads
with
10B+
records
• Up
to
1M
records
per
second
4.58
million
things
and
their
rela<onships…
Loads
in
100
seconds!
52. MIGRATE
ALL
DATA
MIGRATE
GRAPH
DATA
DUPLICATE
GRAPH
DATA
Non-‐graph
data
Graph
data
Graph
data
All
data
All
data
RelaDonal
Database
Graph
Database
Applica<on
Applica<on
Applica<on
Three
Ways
to
Load
Data
into
Neo4j
54. Data
Storage
and
Business
Rules
Execu<on
Data
Mining
and
Aggrega<on
Neo4j
Fits
into
Your
Enterprise
Environment
ApplicaDon
Graph
Database
Cluster
Neo4j
Neo4j
Neo4j
Ad
Hoc
Analysis
Bulk
AnalyDc
Infrastructure
Graph
Compute
Engine
EDW
…
Data
ScienDst
End
User
Databases
Rela<onal
NoSQL
Hadoop
60. Quick
Start:
Plan
Your
Project
1
2
3
4
5
6
7
8
Learn
Neo4j
Decide
on
Architecture
Import
and
Model
Data
Build
ApplicaDon
Test
ApplicaDon
Deploy
your
app
in
as
lible
as
8
weeks
PROFESSIONAL
SERVICES
PLAN
64. Summary
of
the
Power
of
the
Graph
• Take
rela<onships
and
connected
data
seriously
• Seriously
easy
to
model
• Serious
performance
• Fits
in
with
your
Enterprise
Architecture
• Easy
to
get
started
• Fast
to
reap
the
benefits
65. RDBMS
to
Graphs
Harnessing
the
Power
of
the
Graph
Start
of
Q&A
Ryan
Boyd
@ryguyrg