SlideShare a Scribd company logo
1 of 104
Download to read offline
Building Applications
with a Graph Database
Tobias Lindaaker
Software Developer @ Neo Technology
twitter:! @thobe / @neo4j / #neo4j
email:! tobias@neotechnology.com
web:! http://neo4j.org/
web:! http://thobe.org/
CON6484
What you’ll face
๏Modeling your domain
๏Choosing your
deployment model
๏Deploying and maintaining
your application and DB
๏Evolving your application
and your domain
2Most things are Surprisingly Familiar
Introducing the sample Application
3
Neo Technology Test Lab
4
๏One-Stop place for QA
•Real World cluster tests
•Benchmarks
•Charting
•Statistics
‣Uses HdrHistogram
http://giltene.github.io/HdrHistogram/
•Integrated Log analysis
‣GC logs and App logs
๏Click-and-go cluster deployment
Neo Technology Test Lab
5
๏2 perpetual servers
•1 database server
(could be extended to a
cluster for high availability)
•1 “Test Lab Manager”
‣Manages clusters and test
executions
‣Serves up the UI
๏Data-centric HTTP API
๏UI in pure javascript,
static files,
client-side rendering
Neo Technology Test Lab
6
๏All state in DB, allows for
multiple Manager instances,
greatly simplifies redeploy:
1. Start new instance for the
new manager
2. Verify that the new manager
works properly
3. Re-bind elastic IP to new
instance
4. Terminate old instance
๏No downtime on redeploy
Neo Technology Test Lab
7
๏Cute but useful:
Single click to SSH into a
cluster server in the browser
๏VT100 emulator in JavaScript
๏Uses com.jcraft:jsch to let the
manager connect to the server
•(only) the manager has the
private key to the servers
๏Tunnel terminal connection
through WebSocket
๏Really useful for introspection
Why did installation fail?
Analysis of requirements
๏UI for reporting and overview of activity
๏Easy to use & Easy to extend
๏API for triggering real world cluster tests from the CI system
๏Eat our own dog food
•Use Neo4j for storage needs
•Use our Cloud hosting solution
๏Make costs visible
๏Strong desire not to own hardware
8
Data storage/retrieval requirements
๏Store all meta-data about tests and their outcome
•The actual result data can be raw files
๏All entities can have arbitrary events attached
these should always be fetched,
used to determine state of the entity
๏Minimize the number of round-trips made to the database
Each action should preferably be only one DB call
9
Graph Database Queries
10
An overview of Cypher
11
๏START - the node(s) your query starts from - Not needed in Neo4j 2.0
๏MATCH - the pattern to follow from the start point(s)
this expands your search space
๏WHERE - filter instances of the pattern
this reduces your search space
๏RETURN - create a result projection
of each matching instance of the pattern
๏Patterns are described using ASCII-art
•(me)-[:FRIEND]-()-[:FRIEND]-(my_foaf)
(me)-[:LIKES]->()<-[:LIKES]-(foaf)
// find friends of my friends that share an interest with me
The basics in one slide
An overview of Cypher
12
๏CREATE - create nodes and relationships based on a pattern
๏SET - assign properties to nodes and relationships
๏DELETE - delete nodes or relationships
๏CREATE UNIQUE - as CREATE, but only if no match is found
•being superseded by MERGE in Neo4j 2.0
๏FOREACH - perform update operation for each item in a collection
Creates and Updates
Some more advanced Cypher
๏WITH - start a sub-query, carrying over only the declared variables
Similar format to return, allows the same kinds of projections
๏ORDER BY - sort the matching pattern instances by a property
Used in WITH or RETURN.
๏SKIP and LIMIT - page through results, used with ORDER BY.
๏Aggregation
•COLLECT - turn a part of a pattern instance into a collection of
that part for each matching pattern instance
Comparable to SQLs GROUP BY.
•SUM - summarize an expression for each match (like in SQL)
•AVG, MIN, MAX, and COUNT - as in SQL
13
Modeling your domain
14
Domain modeling guideline
15
๏Query first
๏Whiteboard first
๏Examples first
๏Redundancy - avoid
๏Thank You
Look at the top left of your keyboard!
Query First
16
๏Create the model to satisfy your queries
๏Do not attempt to mirror the real world
•You might do that, but it is not a goal in itself
๏Start by writing down the queries you need to satisfy
•Write using natural language
•Then analyze and formalize
๏Now you are ready to draw the model...
Whiteboard first
17
Example first
18
๏Draw one or more examples of entities in your domain
๏Do not leap straight to UML or other archetypical models
๏Once you have a few examples you can draw the model
(unless it is already clear from the examples)
Redundancy - avoid
19
๏Relationships are bi-directional,
avoid creating “inverse” relationships
๏Don’t connect each node of a certain “type” to some node that
represents that type
•Leads to unnecessary bottle necks
•Use the path you reached a node through to know its type
•Use labels to find start points
‣and for deciding type dynamically if multiple are possible
๏Avoid materializing information that can be inferred
•Don’t add FRIEND_OF_A_FRIEND relationships,
when you have FRIEND relationships
Domain modeling method
20
Method
1. Identify application/end-user goals
2. Figure out what questions to ask of the domain
3. Identify entities in each question
4. Identify relationships between entities in each question
5. Convert entities and relationships to paths
- These become the basis of the data model
6. Express questions as graph patterns
- These become the basis for queries
21
Thanks to Ian Robinson
1.Application/End-User Goals
22
As an employee
I want to know who in thecompany has similar skills tome
So that we can exchangeknowledge
Thanks to Ian Robinson
2. Questions to ask of the Domain
23
Which people, who work for the same
company as me, have similar skills to me?
As an employee
I want to know who in thecompany has similar skills tome
So that we can exchangeknowledge
Thanks to Ian Robinson
3. Identify Entities
24
Which people, who work for the same
company as me, have similar skills to me?
•Person
•Company
•Skill
Thanks to Ian Robinson
4. Identify Relationships Between Entities
25
Which people, who work for the same
company as me, have similar skills to me?
•Person WORKS FOR Company
•Person HAS SKILL Skill
Thanks to Ian Robinson
5. Convert to Cypher Paths
26
•Person WORKS FOR Company
•Person HAS SKILL Skill
Thanks to Ian Robinson
5. Convert to Cypher Paths
26
•Person WORKS FOR Company
•Person HAS SKILL Skill
NodeNode
Node Node
Thanks to Ian Robinson
5. Convert to Cypher Paths
26
•Person WORKS FOR Company
•Person HAS SKILL Skill
Relationship
NodeNode Relationship
Node Node
Thanks to Ian Robinson
5. Convert to Cypher Paths
26
•Person WORKS FOR Company
•Person HAS SKILL Skill
(:Person)-[:WORKS_FOR]->(:Company),
(:Person)-[:HAS_SKILL]->(:Skill)
Relationship
NodeNode Relationship
Node Node
Thanks to Ian Robinson
5. Convert to Cypher Paths
26
•Person WORKS FOR Company
•Person HAS SKILL Skill
(:Person)-[:WORKS_FOR]->(:Company),
(:Person)-[:HAS_SKILL]->(:Skill)
Relationship
NodeNode Relationship
Node Node
Label Label
Label Label
Thanks to Ian Robinson
5. Convert to Cypher Paths
26
•Person WORKS FOR Company
•Person HAS SKILL Skill
(:Person)-[:WORKS_FOR]->(:Company),
(:Person)-[:HAS_SKILL]->(:Skill)
Relationship
NodeNode Relationship
Node Node
Label Label
Label Label
Relationship Type
Relationship Type
Thanks to Ian Robinson
Consolidate Pattern
(:Person)-[:WORKS_FOR]->(:Company),
(:Person)-[:HAS_SKILL]->(:Skill)
(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
27
Person SkillCompany
WORKS_FOR HAS_SKILL
Thanks to Ian Robinson
Candidate Data Model
(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
28
name:
Neo4j
name:
Ian
name:
ACME
Person
Company
WORKS_FOR
HAS_SKILL
name:
Jacob
Person
name:
Tobias
Person
W
O
RKS_FO
R W
O
RKS_FO
R
name:
Scala
name:
Python
name:
C#
SkillSkillSkillSkill
HAS_SKILL
HAS_SKILLHAS_SKILL
HAS_SKILL
HAS_SKILL
HAS_SKILL
Thanks to Ian Robinson
6. Express Question as Graph Pattern
Which people, who work for the same
company as me, have similar skills to me?
29
skill
company
Company
colleagueme
Person
W
O
RKS_FO
R
W
O
RKS_FO
R
Skill
HAS_SKILL HAS_SKILL
Person
Thanks to Ian Robinson
Cypher Query
Which people, who work for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill)
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
30
skill
company
Company
colleagueme
Person
W
O
RKS_FO
R
W
O
RKS_FO
R
Skill
HAS_SKILL HAS_SKILL
Person
Thanks to Ian Robinson
Cypher Query
Which people, who work for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill)
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
31
skill
company
Company
colleagueme
Person
W
O
RKS_FO
R
W
O
RKS_FO
R
Skill
HAS_SKILL HAS_SKILL
Person
1. Graph pattern
Cypher Query
Which people, who work for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill)
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
32
skill
company
Company
colleagueme
Person
W
O
RKS_FO
R
W
O
RKS_FO
R
Skill
HAS_SKILL HAS_SKILL
Person
1. Graph pattern
2. Filter, using index if available
Cypher Query
Which people, who work for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill)
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
33
skill
company
Company
colleagueme
Person
W
O
RKS_FO
R
W
O
RKS_FO
R
Skill
HAS_SKILL HAS_SKILL
Person
1. Graph pattern
2. Filter, using index if available
3. Create projection of result
First Match
34
name:
Neo4j
name:
Ian
name:
ACME
Person
Company
WORKS_FOR
HAS_SKILL
name:
Jacob
Person
name:
Tobias
Person
W
O
RKS_FO
R W
O
RKS_FO
R
name:
Scala
name:
Python
name:
C#
SkillSkillSkillSkill
HAS_SKILL
HAS_SKILLHAS_SKILL
HAS_SKILL
HAS_SKILL
HAS_SKILL
skill
company
Company
colleagueme
Person
W
ORKS_FOR
W
O
RKS_FO
R
Skill
HAS_SKILL
HAS_SKILL
Person
Thanks to Ian Robinson
Second Match
35
name:
Neo4j
name:
Ian
name:
ACME
Person
Company
WORKS_FOR
HAS_SKILL
name:
Jacob
Person
name:
Tobias
Person
W
O
RKS_FO
R W
O
RKS_FO
R
name:
Scala
name:
Python
name:
C#
SkillSkillSkillSkill
HAS_SKILL
HAS_SKILLHAS_SKILL
HAS_SKILL
HAS_SKILL
HAS_SKILL
skill
company
Company
colleagueme
Person
W
ORKS_FOR
W
O
RKS_FO
R
Skill
HAS_SKILL
HAS_SKILL
Person
Thanks to Ian Robinson
Third Match
36
name:
Neo4j
name:
Ian
name:
ACME
Person
Company
WORKS_FOR
HAS_SKILL
name:
Jacob
Person
name:
Tobias
Person
W
O
RKS_FO
R W
O
RKS_FO
R
name:
Scala
name:
Python
name:
C#
SkillSkillSkillSkill
HAS_SKILL
HAS_SKILLHAS_SKILL
HAS_SKILL
HAS_SKILL
HAS_SKILL
skill
company
Company
colleagueme
Person
W
ORKS_FOR
W
O
RKS_FO
R
Skill
HAS_SKILL
HAS_SKILL
Person
Thanks to Ian Robinson
Result of the Query
+-------------------------------------+
| name | score | skills |
+-------------------------------------+
| "Ian" | 2 | ["Scala","Neo4j"] |
| "Jacob" | 1 | ["Neo4j"] |
+-------------------------------------+
2 rows
37
Thanks to Ian Robinson
Data Modeling Patterns
38
Ordered List of Entities
39
๏When
•Entities have a natural succession
•You need to traverse the sequence
๏You may need to identify the beginning or end
(first/last, earliest/latest, etc.)
๏Examples
•Event stream
•Episodes of a TV series
•Job history
Thanks to Ian Robinson
Example: Episodes in Doctor Who
40
title:
Robot
title:
The Ark in
Space
title:The
Sontaran
Experiment
title:
Genesis of
the Daleks
title:
Revenge of
the
Cybermen
NEXT NEXT NEXT NEXT NEXT NEXT
NEXT IN
PRODUCTION
Thanks to Ian Robinson
Example: Episodes in Doctor Who
40
title:
Robot
title:
The Ark in
Space
title:The
Sontaran
Experiment
title:
Genesis of
the Daleks
title:
Revenge of
the
Cybermen
NEXT NEXT NEXT NEXT NEXT NEXT
NEXT IN
PRODUCTION
NEXT IN
PRODUCTION
NEXT IN
PRODUCTION
NEXT IN
PRODUCTION
NEXT IN
PRODUCTION
๏Can interleave multiple lists with different semantics
Using different relationship types
Thanks to Ian Robinson
Example: Episodes in Doctor Who
40
title:
Robot
title:
The Ark in
Space
title:The
Sontaran
Experiment
title:
Genesis of
the Daleks
title:
Revenge of
the
Cybermen
NEXT NEXT NEXT NEXT NEXT NEXT
NEXT IN
PRODUCTION
season: 12
NEXT IN
PRODUCTION
NEXT IN
PRODUCTION
NEXT IN
PRODUCTION
NEXT IN
PRODUCTION
LAST
FIRST
๏Can interleave multiple lists with different semantics
Using different relationship types
๏Can organize lists into groups by group nodes
season:
11
NEXT SEASON
Thanks to Ian Robinson
Example: Recent events
41
Add to list
42
MATCH (test:Test{testId:{testId}})
MERGE (recents:Recent{type:"Test"})
CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)
WITH recents, test
MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),
(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)
DELETE previous
CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
Add to list
43
MATCH (test:Test{testId:{testId}})
MERGE (recents:Recent{type:"Test"})
CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)
WITH recents, test
MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),
(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)
DELETE previous
CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
// create the structure we want for the most recent one
MATCH (test:Test{testId:{testId}})
MERGE (recents:Recent{type:"Test"})
CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)
Add to list
44
// create the structure we want for the most recent one
MATCH (test:Test{testId:{testId}})
MERGE (recents:Recent{type:"Test"})
CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)
WITH recents, test
MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),
(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)
DELETE previous
CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
// start a new sub-query, carrying through ‘recents’ and ‘test’
WITH recents, test
Add to list
45
// create the structure we want for the most recent one
MATCH (test:Test{testId:{testId}})
MERGE (recents:Recent{type:"Test"})
CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)
// start a new sub-query, carrying through ‘recents’ and ‘test’
WITH recents, test
MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),
(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)
DELETE previous
CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
// matching the relationship we just created...
MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),
// ...ensures that ‘previous’ is a different relationship
(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)
// if there was no previous, this sub-query will match nothing
Add to list
46
// create the structure we want for the most recent one
MATCH (test:Test{testId:{testId}})
MERGE (recents:Recent{type:"Test"})
CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)
// start a new sub-query, carrying through ‘recents’ and ‘test’
WITH recents, test
// matching the relationship we just created...
MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),
// ...ensures that ‘previous’ is a different relationship
(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)
// if there was no previous, this sub-query will match nothing
DELETE previous
CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
// re-link to the previousTest
DELETE previous
CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
Add to list
47
// create the structure we want for the most recent one
MATCH (test:Test{testId:{testId}})
MERGE (recents:Recent{type:"Test"})
CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)
// start a new sub-query, carrying through ‘recents’ and ‘test’
WITH recents, test
// matching the relationship we just created...
MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),
// ...ensures that ‘previous’ is a different relationship
(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)
// if there was no previous, this sub-query will match nothing
// re-link to the previousTest
DELETE previous
CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
Get 5 most recently completed tests
MATCH (recents:Recent{type:"Test"}),
(recents)-[:LAST_COMPLETED_TEST]->(last)
tests=(last)-[:PREVIOUS_COMPLETED_TEST*0..5]->()
WITH tests ORDER BY length(tests) DESC LIMIT 1
RETURN extract(test IN nodes(tests) : test.testId) AS testIds
48
Get 5 most recently completed tests
MATCH (recents:Recent{type:"Test"}),
(recents)-[:LAST_COMPLETED_TEST]->(last)
tests=(last)-[:PREVIOUS_COMPLETED_TEST*0..5]->()
WITH tests ORDER BY length(tests) DESC LIMIT 1
RETURN extract(test IN nodes(tests) : test.testId) AS testIds
48
Get the next page of 5
MATCH (last:Test{testId={testId}})
tests=(last)-[:PREVIOUS_COMPLETED_TEST*0..5]->()
WITH tests ORDER BY length(tests) DESC LIMIT 1
RETURN extract(test IN nodes(tests) : test.testId) AS testIds
Active-Set pattern
49
Adding and Removing from Active Set
50
// Create cluster into active set
MATCH (clusters:ActiveSet{type:"Cluster"}),
(creator:User{userId:{userId}})
CREATE (clusters)-[:CLUSTER]->(cluster:Cluster{
clusterId: {clusterId},
clusterType: {clusterType}
}),
(cluster)-[:CREATED]->(:Event{timestamp:{creationDate}})
<-[:ACTION]-(creator)
// Destroy cluster (remove it from the active set)
MATCH (cluster:Cluster{clusterId:{clusterId}})<-[r:CLUSTER]-(),
(destroyer:User{userId:{userId}})
CREATE (cluster)-[:DESTROYED]->(:Event{timestamp:{destroyDate}})
<-[:ACTION]-(destroyer)
DELETE r
Entities and Events/Actions
51
๏Events/Actions often involve multiple parties
•Eg. the actor that caused the event, and the affected entity
๏Can include other circumstantial detail, which may be common to
multiple events
๏Examples:
•Patrick worked for Acme from 2001 to 2005 as a Software
Developer
•Sarah sent an email to Lucy, copying in David and Claire
๏In environments with concurrent updates,
events can be used to compute state
•No need to explicitly store state
Thanks to Ian Robinson
Represent the Event/Action as a Node
52
name:
Patrick
from: 2001
to: 2005
title:
Software
Developer
name:
Acme
EMPLOYMENT
ROLE
COMPANY
name:
Sarah
subject: ...
content: ...
name:
Lucy
name:
Sarah
name:
Sarah
FROM TO
CC CC
Thanks to Ian Robinson
Using Events to compute State
53
๏Every update of an entity adds an event to it
๏Every read query collects up all events for the entity
๏Entity state is computed in your (Java) code from the events
public class Cluster {
private final List<ClusterEvent> events;
public ClusterState getState() {
ClusterState state = ClusterState.AWAITING_LAUNCH;
for ( ClusterEvent event : events ) {
ClusterState candidate = event.impliedState();
if ( candidate.comparedTo( state ) > 0 )
state = candidate;
}
return state;
}
// ...
}
Repository pattern
54
๏Centralize your queries into one or a few places
๏Puts load logic (with translation from DB layer to App layer)
next to store logic (with the reverse transformation logic)
๏Simplifies testing
•If you use Java, test with Embedded Neo4j.
Interact through Cypher (for the code under test)
Verify using the object graph API
๏Simplifies model evolution - load/store & conversion encapsulated
Find all active clusters - Neo4j 2.0
MATCH (clusters:ActiveSet{type:"Cluster"})
(clusters)-[:CLUSTER]->(cluster),
(server)-[?:MEMBER_OF]->(cluster),
(server)-[e]->(event:Event),
(event)-[?]->(details)
WITH cluster, server, e, event, collect(details) as eventDetails
WITH cluster, server,
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
MATCH (cluster)-[?:PARAMETERS]->(parameters),
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
RETURN cluster, serverNodeIds, parameters,
collect({ type: type(e),
data: event,
actor: actor} ) as events 55
Find all active clusters - Neo4j 2.0
MATCH (clusters:ActiveSet{type:"Cluster"})
(clusters)-[:CLUSTER]->(cluster),
(server)-[?:MEMBER_OF]->(cluster),
(server)-[e]->(event:Event),
(event)-[?]->(details)
WITH cluster, server, e, event, collect(details) as eventDetails
WITH cluster, server,
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
MATCH (cluster)-[?:PARAMETERS]->(parameters),
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
RETURN cluster, serverNodeIds, parameters,
collect({ type: type(e),
data: event,
actor: actor} ) as events 56
MATCH (clusters:ActiveSet{type:"Cluster"})
(clusters)-[:CLUSTER]->(cluster), // each active cluster
Find all active clusters - Neo4j 2.0
MATCH (clusters:ActiveSet{type:"Cluster"})
(clusters)-[:CLUSTER]->(cluster),
(server)-[?:MEMBER_OF]->(cluster),
(server)-[e]->(event:Event),
(event)-[?]->(details)
WITH cluster, server, e, event, collect(details) as eventDetails
WITH cluster, server,
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
MATCH (cluster)-[?:PARAMETERS]->(parameters),
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
RETURN cluster, serverNodeIds, parameters,
collect({ type: type(e),
data: event,
actor: actor} ) as events 57
(server)-[?:MEMBER_OF]->(cluster),// 0 or more servers
(server)-[e]->(event:Event),// any relationship to an Event
(event)-[?]->(details)// 0 or more details
Find all active clusters - Neo4j 2.0
MATCH (clusters:ActiveSet{type:"Cluster"})
(clusters)-[:CLUSTER]->(cluster),
(server)-[?:MEMBER_OF]->(cluster),
(server)-[e]->(event:Event),
(event)-[?]->(details)
WITH cluster, server, e, event, collect(details) as eventDetails
WITH cluster, server,
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
MATCH (cluster)-[?:PARAMETERS]->(parameters),
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
RETURN cluster, serverNodeIds, parameters,
collect({ type: type(e),
data: event,
actor: actor} ) as events 58
// group by (cluster, server, e, event)
WITH cluster, server, e, event, collect(details) as eventDetails
// A second WITH to do collect-of-collect
WITH cluster, server, // group by (cluster, server)
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
Find all active clusters - Neo4j 2.0
MATCH (clusters:ActiveSet{type:"Cluster"})
(clusters)-[:CLUSTER]->(cluster),
(server)-[?:MEMBER_OF]->(cluster),
(server)-[e]->(event:Event),
(event)-[?]->(details)
WITH cluster, server, e, event, collect(details) as eventDetails
WITH cluster, server,
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
MATCH (cluster)-[?:PARAMETERS]->(parameters),
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
RETURN cluster, serverNodeIds, parameters,
collect({ type: type(e),
data: event,
actor: actor} ) as events 59
// Group the servers (with events) for each cluster
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
Find all active clusters - Neo4j 2.0
MATCH (clusters:ActiveSet{type:"Cluster"})
(clusters)-[:CLUSTER]->(cluster),
(server)-[?:MEMBER_OF]->(cluster),
(server)-[e]->(event:Event),
(event)-[?]->(details)
WITH cluster, server, e, event, collect(details) as eventDetails
WITH cluster, server,
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
MATCH (cluster)-[?:PARAMETERS]->(parameters),
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
RETURN cluster, serverNodeIds, parameters,
collect({ type: type(e),
data: event,
actor: actor} ) as events 60
MATCH (cluster)-[?:PARAMETERS]->(parameters),
// Find all events for this cluster
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
Find all active clusters - Neo4j 2.0
MATCH (clusters:ActiveSet{type:"Cluster"})
(clusters)-[:CLUSTER]->(cluster),
(server)-[?:MEMBER_OF]->(cluster),
(server)-[e]->(event:Event),
(event)-[?]->(details)
WITH cluster, server, e, event, collect(details) as eventDetails
WITH cluster, server,
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
MATCH (cluster)-[?:PARAMETERS]->(parameters),
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
RETURN cluster, serverNodeIds, parameters,
collect({ type: type(e),
data: event,
actor: actor} ) as events 61
RETURN cluster, serverNodeIds, parameters,
// Collect the events in three (aligned) collections
collect({ type: type(e),
data: event,
actor: actor} ) as events
Find all active clusters - Neo4j 2.0
62
MATCH (clusters:ActiveSet{type:"Cluster"})
(clusters)-[:CLUSTER]->(cluster), // each active cluster
(server)-[?:MEMBER_OF]->(cluster),// 0 or more servers
(server)-[e]->(event:Event),// any relationship to an Event
(event)-[?]->(details)// 0 or more details
// group by (cluster, server, e, event)
WITH cluster, server, e, event, collect(details) as eventDetails
// A second WITH to do collect-of-collect
WITH cluster, server, // group by (cluster, server)
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
// Group the servers (with events) for each cluster
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
MATCH (cluster)-[?:PARAMETERS]->(parameters),
// Find all events for this cluster
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
RETURN cluster, serverNodeIds, parameters,
// Collect the events in three (aligned) collections
collect({ type: type(e),
data: event,
actor: actor} ) as events
Get Cluster by ID - Neo4j 2.0
(server)-[?:MEMBER_OF]->(cluster),// 0 or more servers
(server)-[e]->(event:Event),// any relationship to an Event
(event)-[?]->(details)// 0 or more details
// group by (cluster, server, e, event)
WITH cluster, server, e, event, collect(details) as eventDetails
// A second WITH to do collect-of-collect
WITH cluster, server, // group by (cluster, server)
collect({ type: type(e),
data: event,
details: eventDetails }) as serverEvents
// Group the servers (with events) for each cluster
WITH cluster, collect({ server: server,
events: serverEvents }) as servers
MATCH (cluster)-[?:PARAMETERS]->(parameters),
// Find all events for this cluster
(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
RETURN cluster, serverNodeIds, parameters,
// Collect the events in three (aligned) collections
collect({ type: type(e),
data: event,
actor: actor} ) as events 63
MATCH (cluster{type:{clusterId}}) // match single cluster by ID
Query Code Management
64
Query Code Management
•Queries will have similar fragments.
•Store fragments as String constants in code
•Concatenate on load time to get full queries
•Keep all queries static - constants from load time
•Use query parameters for the things that change
•Use repository pattern to encapsulate queries
64
Query Code Management
•Queries will have similar fragments.
•Store fragments as String constants in code
•Concatenate on load time to get full queries
•Keep all queries static - constants from load time
•Use query parameters for the things that change
•Use repository pattern to encapsulate queries
What you’ll gain
•Improves testability - all your queries are known and tested
•Improves security - no injections (parameters are values only)
•Improves performance - the query optimizer cache will love you
64
Multiple layers of models
65
Domain modeling layers
66
Client model (or UI model)
Application model
Database model
๏Multiple abstraction layers
๏Allows evolving the layers
independently
•Client / UI
•Application / Business logic
•Database model
๏Specialize each layer for its purpose
Implementing the domain
67
Choosing your deployment model
68
First: choosing a database!
๏First choice: Model (Relational, Graph,
Document, ...)
๏Second choice: Vendor
•Neo4j - Market leader
•OrientDB - Document/Graph/SQL
•InfiniteGraph - Objectivity as Graph
•DEX - spin off from research group
๏Different vendor, different query language:
•Cypher (Neo4j)
•Gremlin / Blueprints (tinkerpop) 69
Choosing your deployment model
70
๏Standalone DB with the Application as a
connecting client?
๏Database embedded in the Application?
๏Standalone DB with custom extensions?
๏Which client driver?
•Community developed? (endorsed)
•Roll your own?
•No “official” drivers (yet)
vs
๏Pros:
•Familiar deployment
•Code in any language
๏Cons:
•“Interpreted” queries
•Round-trip for algorithmic
queries
71
Standalone Embedded
๏Pros:
•Super fast
•Persistent,Transactional,
infinite memory
๏Cons:
•Java Only
(any JVM language)
•Your App and the DB will
contend for GC
Standalone with custom extensions?
๏A tradeoff attempt to get the best of both worlds.
•Use Cypher for most queries
•Write extensions with custom queries
where performance is insufficient
๏Requires you to write Java
(other JVM languages possible, but harder)
๏Trickier and more verbose API than writing Cypher
๏Can do algorithmic things (custom code) that Cypher cant
๏Better performance in many cases
•Cypher is constantly improving - the need is diminishing
๏Not supported by Neo4j Cloud hosting providers
๏Start with Standalone, add extensions when needed
72
Choosing a client driver
๏Spring Data Neo4j (by Neo Technology)
๏Neography (Ruby, by Max de Marzi, now at Neo Technology)
๏Neo4jPHP (PHP, by Josh Adell)
๏Neo4jClient (.NET, by Tatham Oddie and Romiko Derbynew)
๏Py2neo (Python, by Nigel Small)
๏Neocons (Clojure, by Michael Klishin)
๏and more: neo4j.org/develop/drivers
73
Quite simple to write your own...
๏Focus on the Cypher HTTP endpoint
๏Convert returned JSON to something convenient to work with
•I.e. convert Nodes & Relationships to maps of properties
๏Also need the indexing HTTP endpoint
•at least for Neo4j pre 2.0
๏Less than half a days effort, 1265 LOC (>50% test code)
74
public interface Cypher
{
CypherResult execute( CypherStatement statement )
throws CypherExecutionException;
void addToIndex( long nodeId, String indexName,
String propertyKey, String propertyValue )
throws CypherExecutionException;
void createNodeIfAbsent( String indexName,
String propertyKey, String propertyValue,
Map<String, Object> properties )
throws CypherExecutionException;
}
public class CypherStatement // Builder pattern
{
public CypherStatement( String... lines ) {...}
public CypherStatement withParameter(
String key, Object value ) {
...
return this;
}
}
Official client coming w/ Neo4j 2.{low}
The choices we made for our Test Lab
๏Use AWS
•Mainly for EC2, but once you have bought in to AWS there are a
lot of other services that will serve you well
‣SQS for sending work between servers
‣SNS for sending messages back to the manager
‣S3 for storing files (benchmark results, logs, et.c.)
๏Use Neo4j Cloud
•To have an app where we try it out ourselves
•Make backup and availability a separate concern
75
Deploying and maintaining
your application and DB
76
Operational Concerns
๏Backups
•Weekly full backups
•Daily incremental backups
•Keep logs for 48H (enable incremental backup even if a bit late)
•Why that frequency?
‣Fits the load schedule of most apps
‣Provides very good recovery ability
๏Monitoring
•JMX and Logback supported
•Notifications (e.g. Nagios) being worked on
77
Scaling Neo4j
78
๏Neo4j HA provides
•Fault tolerance by redundancy
•Read scalability by replication
•Writes at same levels as a single instance
๏Neo4j does not yet scale “horizontally”, i.e. shard automatically,
this is being worked on
•For reads your application can route queries for certain parts of
your domain to certain hosts, effectively “sharding” the cache
in Neo4j, keeping different data elements in RAM on different
machines
Pitfalls and Anti-Patterns
79
Modeling Entities as Relationships
80
๏Limits data model evolution
•A relationship connects two things
•Modeling an entity as a relationship prevents it from being
related to more than two things
๏Smells:
•Lots of attribute-like properties
•Use of relationships as starting point of queries
๏Entities hidden in verbs:
•E.g. emailed, reviewed
Thanks to Ian Robinson
Example: Movie Reviews
81
name:
Tobias
name:
Jonas
title:
The Hobbit
title:
The Matrix
REVIEWED REVIEWED REVIEWED
text: This is the ...
source: amazon.com
date: 20100515
text: When I saw ...
source: imdb.com
date: 20121218
text: My brother and ...
source: filmreview.org
date: 20121218
Person Person
Movie Movie
Thanks to Ian Robinson
New Requirement: Comment on Reviews
82
๏Allow users to comment on each others reivews
๏Not possible in this model, can’t connect a review to another entity
name:
Tobias
name:
Jonas
title:
The Hobbit
title:
The Matrix
REVIEWED REVIEWED REVIEWED
text: This is the ...
source: amazon.com
date: 20100515
text: When I saw ...
source: imdb.com
date: 20121218
text: My brother and ...
source: filmreview.org
date: 20121218
Person Person
Movie Movie
Thanks to Ian Robinson
Revised model
83
name:
Tobias
name:
Jonas
title:
The Hobbit
title:
The Matrix
Person Person
Movie Movie
text: This is the ...
source: amazon.com
date: 20100515
text: When I saw ...
source: imdb.com
date: 20121218
text: My brother and ...
source: filmreview.org
date: 20121218
WROTE_REVIEW WROTE_REVIEW WROTE_REVIEW
REVIEW_OFREVIEW_OFREVIEW_OF
ReviewReviewReview
Thanks to Ian Robinson
Evolving your application and your domain
84
Updating the domain model
85
๏Query first,Whiteboard First, Examples first...
๏Update your application domain model to support
both DB model versions
•Write the new version only
๏Test, test, test
๏Re-deploy
๏Run background job to update from old model version to new
•Can be as simple as a single query...
(but can also be more complex)
๏Remove support for old model
๏Re-deploy
Refactoring your graph
Definition
•Restructure graph without changing informational semantics
Reasons
•Improve design
•Enhance performance
•Accommodate new functionality
•Enable iterative and incremental development of data model
The common ones
•Convert a Property to a Node
•Convert a Relationship to a Node
86
Thanks to Ian Robinson
Convert a Property to a Node
// find nodes that have the currency property
MATCH (t:Trade) WHERE has(t.currency)
// limit the size of the transaction
WITH t LIMIT {batchSize}
// find or create the (unique) node for this currency
MERGE (c:Currency{code:t.currency})
// create relationship to the currency node
CREATE (t)-[:CURRENCY]->(c)
// remove the property
REMOVE t.currency
// when the returned count is smaller then batchSize,
// you are done
RETURN count(t) AS numberRemoved
87
Thanks to Ian Robinson
Convert a Relationship to a Node
// find emailed relationships
MATCH (a:User)-[r:EMAILED]->(b:User)
// limit the size of each transaction
WITH a, r, b LIMIT {batchSize}
// create a new node and relationships for it
CREATE (a)<-[:FROM]-(:Email{
content: r.content,
title: t.title
}) -[:TO]-> (b)
// delete the old relationship
DELETE r
// when the returned count is smaller then batchSize,
// you are done
RETURN count(r) AS numberDeleted
88
Thanks to Ian Robinson
Neo4j 2.0
89
Neo4j 2.0
90
๏All about making it more convenient to model data
๏“Labels” for Nodes, enable you to model your types
๏Indexing performed by the database, automatically, based on Labels
๏Also adds user definable constraints, based on Labels
๏START clause is gone from Cypher,
instead MATCH uses the schema information from labels used in
your query to determine the best start points.
Migrating to 2.0
91
๏Test queries with new Neo4j version
•Explicitly specify Cypher version for queries that fail
(prefix with CYPHER 1.9 - this will work with existing db)
๏Redeploy app with queries known to work on both versions
๏Update the database - rolling with HA, downtime with single db
๏Very similar process for updating the domain model...
•Create schema for your domain
with indexes to replace your manual indexes
•Make your write-queries add labels
•Update all existing data: add labels
•Change reads to use MATCH with labels instead of START
•Drop old (manual) indexes
Summary
92
Building apps with Graph Databases
๏Model for your Queries, draw on a Whiteboard, using Examples,
avoid Redundancy, Thank You.
๏Use Cypher where possible,
write Java extensions if needed for performance
(frequently not needed - just update to next version)
๏Incremental modeling approach supported and pleasant!
๏Most Application Development Best Practices are the same!
๏Neo4j 2.0 makes modeling a whole lot nicer
•makes Cypher complete - no need to index manually!
93
http://neotechnology.com
Questions?

More Related Content

What's hot

Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
Takrim Ul Islam Laskar
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
Neo4j
 

What's hot (20)

RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
GraphQL ♥︎ GraphDB
GraphQL ♥︎ GraphDBGraphQL ♥︎ GraphDB
GraphQL ♥︎ GraphDB
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Boost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined ProceduresBoost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined Procedures
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Graphdatabases
GraphdatabasesGraphdatabases
Graphdatabases
 
Neo4j Drivers Best Practices
Neo4j Drivers Best PracticesNeo4j Drivers Best Practices
Neo4j Drivers Best Practices
 
Base de données graphe et Neo4j
Base de données graphe et Neo4jBase de données graphe et Neo4j
Base de données graphe et Neo4j
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j Presentation
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Graph database
Graph database Graph database
Graph database
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 

Viewers also liked

Viewers also liked (20)

NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
Java8 training - class 3
Java8 training - class 3Java8 training - class 3
Java8 training - class 3
 
Java8 training - class 2
Java8 training - class 2Java8 training - class 2
Java8 training - class 2
 
Java8 training - Class 1
Java8 training  - Class 1Java8 training  - Class 1
Java8 training - Class 1
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
What is concurrency
What is concurrencyWhat is concurrency
What is concurrency
 
Java8
Java8Java8
Java8
 
Jumping-with-java8
Jumping-with-java8Jumping-with-java8
Jumping-with-java8
 
Java Hands-On Workshop
Java Hands-On WorkshopJava Hands-On Workshop
Java Hands-On Workshop
 
Apache camel
Apache camelApache camel
Apache camel
 
Concurrency & Parallel Programming
Concurrency & Parallel ProgrammingConcurrency & Parallel Programming
Concurrency & Parallel Programming
 
Java day2016 "Reinventing design patterns with java 8"
Java day2016 "Reinventing design patterns with java 8"Java day2016 "Reinventing design patterns with java 8"
Java day2016 "Reinventing design patterns with java 8"
 
Why Transcriptome? Why RNA-Seq? ENCODE answers….
Why Transcriptome? Why RNA-Seq?  ENCODE answers….Why Transcriptome? Why RNA-Seq?  ENCODE answers….
Why Transcriptome? Why RNA-Seq? ENCODE answers….
 
Java Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and TrendsJava Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and Trends
 
Java Multithreading and Concurrency
Java Multithreading and ConcurrencyJava Multithreading and Concurrency
Java Multithreading and Concurrency
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Working With Concurrency In Java 8
Working With Concurrency In Java 8Working With Concurrency In Java 8
Working With Concurrency In Java 8
 
Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in Practice
 
dna Imprinting
 dna Imprinting dna Imprinting
dna Imprinting
 
Java 8 concurrency abstractions
Java 8 concurrency abstractionsJava 8 concurrency abstractions
Java 8 concurrency abstractions
 

Similar to Building Applications with a Graph Database

Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
Neo4j
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 

Similar to Building Applications with a Graph Database (20)

Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
QCon 2014 - How Shutl delivers even faster with Neo4j
QCon 2014 - How Shutl delivers even faster with Neo4jQCon 2014 - How Shutl delivers even faster with Neo4j
QCon 2014 - How Shutl delivers even faster with Neo4j
 
Introduction to Neo4j and .Net
Introduction to Neo4j and .NetIntroduction to Neo4j and .Net
Introduction to Neo4j and .Net
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
 
How Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4JHow Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4J
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
 
Surviving your frontend (WIP - Sneak Peak)
Surviving your frontend (WIP - Sneak Peak)Surviving your frontend (WIP - Sneak Peak)
Surviving your frontend (WIP - Sneak Peak)
 
Building Search Systems for the Enterprise
Building Search Systems for the EnterpriseBuilding Search Systems for the Enterprise
Building Search Systems for the Enterprise
 
OneStream Functionality You Might Not be Using (But Should Be)
OneStream Functionality You Might Not be Using (But Should Be)OneStream Functionality You Might Not be Using (But Should Be)
OneStream Functionality You Might Not be Using (But Should Be)
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure application
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should know
 
CM NCCU Class2
CM NCCU Class2CM NCCU Class2
CM NCCU Class2
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to Graphs
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
SQL Tutorial for Marketers
SQL Tutorial for MarketersSQL Tutorial for Marketers
SQL Tutorial for Marketers
 
Learning Web Development with Ruby on Rails Launch
Learning Web Development with Ruby on Rails LaunchLearning Web Development with Ruby on Rails Launch
Learning Web Development with Ruby on Rails Launch
 

More from Tobias Lindaaker (8)

JDK Power Tools
JDK Power ToolsJDK Power Tools
JDK Power Tools
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL database
 
[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks ass
 
Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4j
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVM
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVM
 
Exploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesExploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic Languages
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Building Applications with a Graph Database

  • 1. Building Applications with a Graph Database Tobias Lindaaker Software Developer @ Neo Technology twitter:! @thobe / @neo4j / #neo4j email:! tobias@neotechnology.com web:! http://neo4j.org/ web:! http://thobe.org/ CON6484
  • 2. What you’ll face ๏Modeling your domain ๏Choosing your deployment model ๏Deploying and maintaining your application and DB ๏Evolving your application and your domain 2Most things are Surprisingly Familiar
  • 3. Introducing the sample Application 3
  • 4. Neo Technology Test Lab 4 ๏One-Stop place for QA •Real World cluster tests •Benchmarks •Charting •Statistics ‣Uses HdrHistogram http://giltene.github.io/HdrHistogram/ •Integrated Log analysis ‣GC logs and App logs ๏Click-and-go cluster deployment
  • 5. Neo Technology Test Lab 5 ๏2 perpetual servers •1 database server (could be extended to a cluster for high availability) •1 “Test Lab Manager” ‣Manages clusters and test executions ‣Serves up the UI ๏Data-centric HTTP API ๏UI in pure javascript, static files, client-side rendering
  • 6. Neo Technology Test Lab 6 ๏All state in DB, allows for multiple Manager instances, greatly simplifies redeploy: 1. Start new instance for the new manager 2. Verify that the new manager works properly 3. Re-bind elastic IP to new instance 4. Terminate old instance ๏No downtime on redeploy
  • 7. Neo Technology Test Lab 7 ๏Cute but useful: Single click to SSH into a cluster server in the browser ๏VT100 emulator in JavaScript ๏Uses com.jcraft:jsch to let the manager connect to the server •(only) the manager has the private key to the servers ๏Tunnel terminal connection through WebSocket ๏Really useful for introspection Why did installation fail?
  • 8. Analysis of requirements ๏UI for reporting and overview of activity ๏Easy to use & Easy to extend ๏API for triggering real world cluster tests from the CI system ๏Eat our own dog food •Use Neo4j for storage needs •Use our Cloud hosting solution ๏Make costs visible ๏Strong desire not to own hardware 8
  • 9. Data storage/retrieval requirements ๏Store all meta-data about tests and their outcome •The actual result data can be raw files ๏All entities can have arbitrary events attached these should always be fetched, used to determine state of the entity ๏Minimize the number of round-trips made to the database Each action should preferably be only one DB call 9
  • 11. An overview of Cypher 11 ๏START - the node(s) your query starts from - Not needed in Neo4j 2.0 ๏MATCH - the pattern to follow from the start point(s) this expands your search space ๏WHERE - filter instances of the pattern this reduces your search space ๏RETURN - create a result projection of each matching instance of the pattern ๏Patterns are described using ASCII-art •(me)-[:FRIEND]-()-[:FRIEND]-(my_foaf) (me)-[:LIKES]->()<-[:LIKES]-(foaf) // find friends of my friends that share an interest with me The basics in one slide
  • 12. An overview of Cypher 12 ๏CREATE - create nodes and relationships based on a pattern ๏SET - assign properties to nodes and relationships ๏DELETE - delete nodes or relationships ๏CREATE UNIQUE - as CREATE, but only if no match is found •being superseded by MERGE in Neo4j 2.0 ๏FOREACH - perform update operation for each item in a collection Creates and Updates
  • 13. Some more advanced Cypher ๏WITH - start a sub-query, carrying over only the declared variables Similar format to return, allows the same kinds of projections ๏ORDER BY - sort the matching pattern instances by a property Used in WITH or RETURN. ๏SKIP and LIMIT - page through results, used with ORDER BY. ๏Aggregation •COLLECT - turn a part of a pattern instance into a collection of that part for each matching pattern instance Comparable to SQLs GROUP BY. •SUM - summarize an expression for each match (like in SQL) •AVG, MIN, MAX, and COUNT - as in SQL 13
  • 15. Domain modeling guideline 15 ๏Query first ๏Whiteboard first ๏Examples first ๏Redundancy - avoid ๏Thank You Look at the top left of your keyboard!
  • 16. Query First 16 ๏Create the model to satisfy your queries ๏Do not attempt to mirror the real world •You might do that, but it is not a goal in itself ๏Start by writing down the queries you need to satisfy •Write using natural language •Then analyze and formalize ๏Now you are ready to draw the model...
  • 18. Example first 18 ๏Draw one or more examples of entities in your domain ๏Do not leap straight to UML or other archetypical models ๏Once you have a few examples you can draw the model (unless it is already clear from the examples)
  • 19. Redundancy - avoid 19 ๏Relationships are bi-directional, avoid creating “inverse” relationships ๏Don’t connect each node of a certain “type” to some node that represents that type •Leads to unnecessary bottle necks •Use the path you reached a node through to know its type •Use labels to find start points ‣and for deciding type dynamically if multiple are possible ๏Avoid materializing information that can be inferred •Don’t add FRIEND_OF_A_FRIEND relationships, when you have FRIEND relationships
  • 21. Method 1. Identify application/end-user goals 2. Figure out what questions to ask of the domain 3. Identify entities in each question 4. Identify relationships between entities in each question 5. Convert entities and relationships to paths - These become the basis of the data model 6. Express questions as graph patterns - These become the basis for queries 21 Thanks to Ian Robinson
  • 22. 1.Application/End-User Goals 22 As an employee I want to know who in thecompany has similar skills tome So that we can exchangeknowledge Thanks to Ian Robinson
  • 23. 2. Questions to ask of the Domain 23 Which people, who work for the same company as me, have similar skills to me? As an employee I want to know who in thecompany has similar skills tome So that we can exchangeknowledge Thanks to Ian Robinson
  • 24. 3. Identify Entities 24 Which people, who work for the same company as me, have similar skills to me? •Person •Company •Skill Thanks to Ian Robinson
  • 25. 4. Identify Relationships Between Entities 25 Which people, who work for the same company as me, have similar skills to me? •Person WORKS FOR Company •Person HAS SKILL Skill Thanks to Ian Robinson
  • 26. 5. Convert to Cypher Paths 26 •Person WORKS FOR Company •Person HAS SKILL Skill Thanks to Ian Robinson
  • 27. 5. Convert to Cypher Paths 26 •Person WORKS FOR Company •Person HAS SKILL Skill NodeNode Node Node Thanks to Ian Robinson
  • 28. 5. Convert to Cypher Paths 26 •Person WORKS FOR Company •Person HAS SKILL Skill Relationship NodeNode Relationship Node Node Thanks to Ian Robinson
  • 29. 5. Convert to Cypher Paths 26 •Person WORKS FOR Company •Person HAS SKILL Skill (:Person)-[:WORKS_FOR]->(:Company), (:Person)-[:HAS_SKILL]->(:Skill) Relationship NodeNode Relationship Node Node Thanks to Ian Robinson
  • 30. 5. Convert to Cypher Paths 26 •Person WORKS FOR Company •Person HAS SKILL Skill (:Person)-[:WORKS_FOR]->(:Company), (:Person)-[:HAS_SKILL]->(:Skill) Relationship NodeNode Relationship Node Node Label Label Label Label Thanks to Ian Robinson
  • 31. 5. Convert to Cypher Paths 26 •Person WORKS FOR Company •Person HAS SKILL Skill (:Person)-[:WORKS_FOR]->(:Company), (:Person)-[:HAS_SKILL]->(:Skill) Relationship NodeNode Relationship Node Node Label Label Label Label Relationship Type Relationship Type Thanks to Ian Robinson
  • 33. Candidate Data Model (:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill) 28 name: Neo4j name: Ian name: ACME Person Company WORKS_FOR HAS_SKILL name: Jacob Person name: Tobias Person W O RKS_FO R W O RKS_FO R name: Scala name: Python name: C# SkillSkillSkillSkill HAS_SKILL HAS_SKILLHAS_SKILL HAS_SKILL HAS_SKILL HAS_SKILL Thanks to Ian Robinson
  • 34. 6. Express Question as Graph Pattern Which people, who work for the same company as me, have similar skills to me? 29 skill company Company colleagueme Person W O RKS_FO R W O RKS_FO R Skill HAS_SKILL HAS_SKILL Person Thanks to Ian Robinson
  • 35. Cypher Query Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill) (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC 30 skill company Company colleagueme Person W O RKS_FO R W O RKS_FO R Skill HAS_SKILL HAS_SKILL Person Thanks to Ian Robinson
  • 36. Cypher Query Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill) (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC 31 skill company Company colleagueme Person W O RKS_FO R W O RKS_FO R Skill HAS_SKILL HAS_SKILL Person 1. Graph pattern
  • 37. Cypher Query Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill) (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC 32 skill company Company colleagueme Person W O RKS_FO R W O RKS_FO R Skill HAS_SKILL HAS_SKILL Person 1. Graph pattern 2. Filter, using index if available
  • 38. Cypher Query Which people, who work for the same company as me, have similar skills to me? MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill) (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill) WHERE me.name = {name} RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skills ORDER BY score DESC 33 skill company Company colleagueme Person W O RKS_FO R W O RKS_FO R Skill HAS_SKILL HAS_SKILL Person 1. Graph pattern 2. Filter, using index if available 3. Create projection of result
  • 42. Result of the Query +-------------------------------------+ | name | score | skills | +-------------------------------------+ | "Ian" | 2 | ["Scala","Neo4j"] | | "Jacob" | 1 | ["Neo4j"] | +-------------------------------------+ 2 rows 37 Thanks to Ian Robinson
  • 44. Ordered List of Entities 39 ๏When •Entities have a natural succession •You need to traverse the sequence ๏You may need to identify the beginning or end (first/last, earliest/latest, etc.) ๏Examples •Event stream •Episodes of a TV series •Job history Thanks to Ian Robinson
  • 45. Example: Episodes in Doctor Who 40 title: Robot title: The Ark in Space title:The Sontaran Experiment title: Genesis of the Daleks title: Revenge of the Cybermen NEXT NEXT NEXT NEXT NEXT NEXT NEXT IN PRODUCTION Thanks to Ian Robinson
  • 46. Example: Episodes in Doctor Who 40 title: Robot title: The Ark in Space title:The Sontaran Experiment title: Genesis of the Daleks title: Revenge of the Cybermen NEXT NEXT NEXT NEXT NEXT NEXT NEXT IN PRODUCTION NEXT IN PRODUCTION NEXT IN PRODUCTION NEXT IN PRODUCTION NEXT IN PRODUCTION ๏Can interleave multiple lists with different semantics Using different relationship types Thanks to Ian Robinson
  • 47. Example: Episodes in Doctor Who 40 title: Robot title: The Ark in Space title:The Sontaran Experiment title: Genesis of the Daleks title: Revenge of the Cybermen NEXT NEXT NEXT NEXT NEXT NEXT NEXT IN PRODUCTION season: 12 NEXT IN PRODUCTION NEXT IN PRODUCTION NEXT IN PRODUCTION NEXT IN PRODUCTION LAST FIRST ๏Can interleave multiple lists with different semantics Using different relationship types ๏Can organize lists into groups by group nodes season: 11 NEXT SEASON Thanks to Ian Robinson
  • 49. Add to list 42 MATCH (test:Test{testId:{testId}}) MERGE (recents:Recent{type:"Test"}) CREATE (recents)-[:LAST_COMPLETED_TEST]->(test) WITH recents, test MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents) DELETE previous CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
  • 50. Add to list 43 MATCH (test:Test{testId:{testId}}) MERGE (recents:Recent{type:"Test"}) CREATE (recents)-[:LAST_COMPLETED_TEST]->(test) WITH recents, test MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents) DELETE previous CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest) // create the structure we want for the most recent one MATCH (test:Test{testId:{testId}}) MERGE (recents:Recent{type:"Test"}) CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)
  • 51. Add to list 44 // create the structure we want for the most recent one MATCH (test:Test{testId:{testId}}) MERGE (recents:Recent{type:"Test"}) CREATE (recents)-[:LAST_COMPLETED_TEST]->(test) WITH recents, test MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents) DELETE previous CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest) // start a new sub-query, carrying through ‘recents’ and ‘test’ WITH recents, test
  • 52. Add to list 45 // create the structure we want for the most recent one MATCH (test:Test{testId:{testId}}) MERGE (recents:Recent{type:"Test"}) CREATE (recents)-[:LAST_COMPLETED_TEST]->(test) // start a new sub-query, carrying through ‘recents’ and ‘test’ WITH recents, test MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents) DELETE previous CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest) // matching the relationship we just created... MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), // ...ensures that ‘previous’ is a different relationship (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents) // if there was no previous, this sub-query will match nothing
  • 53. Add to list 46 // create the structure we want for the most recent one MATCH (test:Test{testId:{testId}}) MERGE (recents:Recent{type:"Test"}) CREATE (recents)-[:LAST_COMPLETED_TEST]->(test) // start a new sub-query, carrying through ‘recents’ and ‘test’ WITH recents, test // matching the relationship we just created... MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), // ...ensures that ‘previous’ is a different relationship (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents) // if there was no previous, this sub-query will match nothing DELETE previous CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest) // re-link to the previousTest DELETE previous CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
  • 54. Add to list 47 // create the structure we want for the most recent one MATCH (test:Test{testId:{testId}}) MERGE (recents:Recent{type:"Test"}) CREATE (recents)-[:LAST_COMPLETED_TEST]->(test) // start a new sub-query, carrying through ‘recents’ and ‘test’ WITH recents, test // matching the relationship we just created... MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), // ...ensures that ‘previous’ is a different relationship (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents) // if there was no previous, this sub-query will match nothing // re-link to the previousTest DELETE previous CREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)
  • 55. Get 5 most recently completed tests MATCH (recents:Recent{type:"Test"}), (recents)-[:LAST_COMPLETED_TEST]->(last) tests=(last)-[:PREVIOUS_COMPLETED_TEST*0..5]->() WITH tests ORDER BY length(tests) DESC LIMIT 1 RETURN extract(test IN nodes(tests) : test.testId) AS testIds 48
  • 56. Get 5 most recently completed tests MATCH (recents:Recent{type:"Test"}), (recents)-[:LAST_COMPLETED_TEST]->(last) tests=(last)-[:PREVIOUS_COMPLETED_TEST*0..5]->() WITH tests ORDER BY length(tests) DESC LIMIT 1 RETURN extract(test IN nodes(tests) : test.testId) AS testIds 48 Get the next page of 5 MATCH (last:Test{testId={testId}}) tests=(last)-[:PREVIOUS_COMPLETED_TEST*0..5]->() WITH tests ORDER BY length(tests) DESC LIMIT 1 RETURN extract(test IN nodes(tests) : test.testId) AS testIds
  • 58. Adding and Removing from Active Set 50 // Create cluster into active set MATCH (clusters:ActiveSet{type:"Cluster"}), (creator:User{userId:{userId}}) CREATE (clusters)-[:CLUSTER]->(cluster:Cluster{ clusterId: {clusterId}, clusterType: {clusterType} }), (cluster)-[:CREATED]->(:Event{timestamp:{creationDate}}) <-[:ACTION]-(creator) // Destroy cluster (remove it from the active set) MATCH (cluster:Cluster{clusterId:{clusterId}})<-[r:CLUSTER]-(), (destroyer:User{userId:{userId}}) CREATE (cluster)-[:DESTROYED]->(:Event{timestamp:{destroyDate}}) <-[:ACTION]-(destroyer) DELETE r
  • 59. Entities and Events/Actions 51 ๏Events/Actions often involve multiple parties •Eg. the actor that caused the event, and the affected entity ๏Can include other circumstantial detail, which may be common to multiple events ๏Examples: •Patrick worked for Acme from 2001 to 2005 as a Software Developer •Sarah sent an email to Lucy, copying in David and Claire ๏In environments with concurrent updates, events can be used to compute state •No need to explicitly store state Thanks to Ian Robinson
  • 60. Represent the Event/Action as a Node 52 name: Patrick from: 2001 to: 2005 title: Software Developer name: Acme EMPLOYMENT ROLE COMPANY name: Sarah subject: ... content: ... name: Lucy name: Sarah name: Sarah FROM TO CC CC Thanks to Ian Robinson
  • 61. Using Events to compute State 53 ๏Every update of an entity adds an event to it ๏Every read query collects up all events for the entity ๏Entity state is computed in your (Java) code from the events public class Cluster { private final List<ClusterEvent> events; public ClusterState getState() { ClusterState state = ClusterState.AWAITING_LAUNCH; for ( ClusterEvent event : events ) { ClusterState candidate = event.impliedState(); if ( candidate.comparedTo( state ) > 0 ) state = candidate; } return state; } // ... }
  • 62. Repository pattern 54 ๏Centralize your queries into one or a few places ๏Puts load logic (with translation from DB layer to App layer) next to store logic (with the reverse transformation logic) ๏Simplifies testing •If you use Java, test with Embedded Neo4j. Interact through Cypher (for the code under test) Verify using the object graph API ๏Simplifies model evolution - load/store & conversion encapsulated
  • 63. Find all active clusters - Neo4j 2.0 MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), (server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details) WITH cluster, server, e, event, collect(details) as eventDetails WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents WITH cluster, collect({ server: server, events: serverEvents }) as servers MATCH (cluster)-[?:PARAMETERS]->(parameters), (cluster)-[e]->(event:Event)<-[:ACTION]-(actor) RETURN cluster, serverNodeIds, parameters, collect({ type: type(e), data: event, actor: actor} ) as events 55
  • 64. Find all active clusters - Neo4j 2.0 MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), (server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details) WITH cluster, server, e, event, collect(details) as eventDetails WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents WITH cluster, collect({ server: server, events: serverEvents }) as servers MATCH (cluster)-[?:PARAMETERS]->(parameters), (cluster)-[e]->(event:Event)<-[:ACTION]-(actor) RETURN cluster, serverNodeIds, parameters, collect({ type: type(e), data: event, actor: actor} ) as events 56 MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), // each active cluster
  • 65. Find all active clusters - Neo4j 2.0 MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), (server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details) WITH cluster, server, e, event, collect(details) as eventDetails WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents WITH cluster, collect({ server: server, events: serverEvents }) as servers MATCH (cluster)-[?:PARAMETERS]->(parameters), (cluster)-[e]->(event:Event)<-[:ACTION]-(actor) RETURN cluster, serverNodeIds, parameters, collect({ type: type(e), data: event, actor: actor} ) as events 57 (server)-[?:MEMBER_OF]->(cluster),// 0 or more servers (server)-[e]->(event:Event),// any relationship to an Event (event)-[?]->(details)// 0 or more details
  • 66. Find all active clusters - Neo4j 2.0 MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), (server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details) WITH cluster, server, e, event, collect(details) as eventDetails WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents WITH cluster, collect({ server: server, events: serverEvents }) as servers MATCH (cluster)-[?:PARAMETERS]->(parameters), (cluster)-[e]->(event:Event)<-[:ACTION]-(actor) RETURN cluster, serverNodeIds, parameters, collect({ type: type(e), data: event, actor: actor} ) as events 58 // group by (cluster, server, e, event) WITH cluster, server, e, event, collect(details) as eventDetails // A second WITH to do collect-of-collect WITH cluster, server, // group by (cluster, server) collect({ type: type(e), data: event, details: eventDetails }) as serverEvents
  • 67. Find all active clusters - Neo4j 2.0 MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), (server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details) WITH cluster, server, e, event, collect(details) as eventDetails WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents WITH cluster, collect({ server: server, events: serverEvents }) as servers MATCH (cluster)-[?:PARAMETERS]->(parameters), (cluster)-[e]->(event:Event)<-[:ACTION]-(actor) RETURN cluster, serverNodeIds, parameters, collect({ type: type(e), data: event, actor: actor} ) as events 59 // Group the servers (with events) for each cluster WITH cluster, collect({ server: server, events: serverEvents }) as servers
  • 68. Find all active clusters - Neo4j 2.0 MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), (server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details) WITH cluster, server, e, event, collect(details) as eventDetails WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents WITH cluster, collect({ server: server, events: serverEvents }) as servers MATCH (cluster)-[?:PARAMETERS]->(parameters), (cluster)-[e]->(event:Event)<-[:ACTION]-(actor) RETURN cluster, serverNodeIds, parameters, collect({ type: type(e), data: event, actor: actor} ) as events 60 MATCH (cluster)-[?:PARAMETERS]->(parameters), // Find all events for this cluster (cluster)-[e]->(event:Event)<-[:ACTION]-(actor)
  • 69. Find all active clusters - Neo4j 2.0 MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), (server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details) WITH cluster, server, e, event, collect(details) as eventDetails WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents WITH cluster, collect({ server: server, events: serverEvents }) as servers MATCH (cluster)-[?:PARAMETERS]->(parameters), (cluster)-[e]->(event:Event)<-[:ACTION]-(actor) RETURN cluster, serverNodeIds, parameters, collect({ type: type(e), data: event, actor: actor} ) as events 61 RETURN cluster, serverNodeIds, parameters, // Collect the events in three (aligned) collections collect({ type: type(e), data: event, actor: actor} ) as events
  • 70. Find all active clusters - Neo4j 2.0 62 MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), // each active cluster (server)-[?:MEMBER_OF]->(cluster),// 0 or more servers (server)-[e]->(event:Event),// any relationship to an Event (event)-[?]->(details)// 0 or more details // group by (cluster, server, e, event) WITH cluster, server, e, event, collect(details) as eventDetails // A second WITH to do collect-of-collect WITH cluster, server, // group by (cluster, server) collect({ type: type(e), data: event, details: eventDetails }) as serverEvents // Group the servers (with events) for each cluster WITH cluster, collect({ server: server, events: serverEvents }) as servers MATCH (cluster)-[?:PARAMETERS]->(parameters), // Find all events for this cluster (cluster)-[e]->(event:Event)<-[:ACTION]-(actor) RETURN cluster, serverNodeIds, parameters, // Collect the events in three (aligned) collections collect({ type: type(e), data: event, actor: actor} ) as events
  • 71. Get Cluster by ID - Neo4j 2.0 (server)-[?:MEMBER_OF]->(cluster),// 0 or more servers (server)-[e]->(event:Event),// any relationship to an Event (event)-[?]->(details)// 0 or more details // group by (cluster, server, e, event) WITH cluster, server, e, event, collect(details) as eventDetails // A second WITH to do collect-of-collect WITH cluster, server, // group by (cluster, server) collect({ type: type(e), data: event, details: eventDetails }) as serverEvents // Group the servers (with events) for each cluster WITH cluster, collect({ server: server, events: serverEvents }) as servers MATCH (cluster)-[?:PARAMETERS]->(parameters), // Find all events for this cluster (cluster)-[e]->(event:Event)<-[:ACTION]-(actor) RETURN cluster, serverNodeIds, parameters, // Collect the events in three (aligned) collections collect({ type: type(e), data: event, actor: actor} ) as events 63 MATCH (cluster{type:{clusterId}}) // match single cluster by ID
  • 73. Query Code Management •Queries will have similar fragments. •Store fragments as String constants in code •Concatenate on load time to get full queries •Keep all queries static - constants from load time •Use query parameters for the things that change •Use repository pattern to encapsulate queries 64
  • 74. Query Code Management •Queries will have similar fragments. •Store fragments as String constants in code •Concatenate on load time to get full queries •Keep all queries static - constants from load time •Use query parameters for the things that change •Use repository pattern to encapsulate queries What you’ll gain •Improves testability - all your queries are known and tested •Improves security - no injections (parameters are values only) •Improves performance - the query optimizer cache will love you 64
  • 75. Multiple layers of models 65
  • 76. Domain modeling layers 66 Client model (or UI model) Application model Database model ๏Multiple abstraction layers ๏Allows evolving the layers independently •Client / UI •Application / Business logic •Database model ๏Specialize each layer for its purpose
  • 79. First: choosing a database! ๏First choice: Model (Relational, Graph, Document, ...) ๏Second choice: Vendor •Neo4j - Market leader •OrientDB - Document/Graph/SQL •InfiniteGraph - Objectivity as Graph •DEX - spin off from research group ๏Different vendor, different query language: •Cypher (Neo4j) •Gremlin / Blueprints (tinkerpop) 69
  • 80. Choosing your deployment model 70 ๏Standalone DB with the Application as a connecting client? ๏Database embedded in the Application? ๏Standalone DB with custom extensions? ๏Which client driver? •Community developed? (endorsed) •Roll your own? •No “official” drivers (yet)
  • 81. vs ๏Pros: •Familiar deployment •Code in any language ๏Cons: •“Interpreted” queries •Round-trip for algorithmic queries 71 Standalone Embedded ๏Pros: •Super fast •Persistent,Transactional, infinite memory ๏Cons: •Java Only (any JVM language) •Your App and the DB will contend for GC
  • 82. Standalone with custom extensions? ๏A tradeoff attempt to get the best of both worlds. •Use Cypher for most queries •Write extensions with custom queries where performance is insufficient ๏Requires you to write Java (other JVM languages possible, but harder) ๏Trickier and more verbose API than writing Cypher ๏Can do algorithmic things (custom code) that Cypher cant ๏Better performance in many cases •Cypher is constantly improving - the need is diminishing ๏Not supported by Neo4j Cloud hosting providers ๏Start with Standalone, add extensions when needed 72
  • 83. Choosing a client driver ๏Spring Data Neo4j (by Neo Technology) ๏Neography (Ruby, by Max de Marzi, now at Neo Technology) ๏Neo4jPHP (PHP, by Josh Adell) ๏Neo4jClient (.NET, by Tatham Oddie and Romiko Derbynew) ๏Py2neo (Python, by Nigel Small) ๏Neocons (Clojure, by Michael Klishin) ๏and more: neo4j.org/develop/drivers 73
  • 84. Quite simple to write your own... ๏Focus on the Cypher HTTP endpoint ๏Convert returned JSON to something convenient to work with •I.e. convert Nodes & Relationships to maps of properties ๏Also need the indexing HTTP endpoint •at least for Neo4j pre 2.0 ๏Less than half a days effort, 1265 LOC (>50% test code) 74 public interface Cypher { CypherResult execute( CypherStatement statement ) throws CypherExecutionException; void addToIndex( long nodeId, String indexName, String propertyKey, String propertyValue ) throws CypherExecutionException; void createNodeIfAbsent( String indexName, String propertyKey, String propertyValue, Map<String, Object> properties ) throws CypherExecutionException; } public class CypherStatement // Builder pattern { public CypherStatement( String... lines ) {...} public CypherStatement withParameter( String key, Object value ) { ... return this; } } Official client coming w/ Neo4j 2.{low}
  • 85. The choices we made for our Test Lab ๏Use AWS •Mainly for EC2, but once you have bought in to AWS there are a lot of other services that will serve you well ‣SQS for sending work between servers ‣SNS for sending messages back to the manager ‣S3 for storing files (benchmark results, logs, et.c.) ๏Use Neo4j Cloud •To have an app where we try it out ourselves •Make backup and availability a separate concern 75
  • 86. Deploying and maintaining your application and DB 76
  • 87. Operational Concerns ๏Backups •Weekly full backups •Daily incremental backups •Keep logs for 48H (enable incremental backup even if a bit late) •Why that frequency? ‣Fits the load schedule of most apps ‣Provides very good recovery ability ๏Monitoring •JMX and Logback supported •Notifications (e.g. Nagios) being worked on 77
  • 88. Scaling Neo4j 78 ๏Neo4j HA provides •Fault tolerance by redundancy •Read scalability by replication •Writes at same levels as a single instance ๏Neo4j does not yet scale “horizontally”, i.e. shard automatically, this is being worked on •For reads your application can route queries for certain parts of your domain to certain hosts, effectively “sharding” the cache in Neo4j, keeping different data elements in RAM on different machines
  • 90. Modeling Entities as Relationships 80 ๏Limits data model evolution •A relationship connects two things •Modeling an entity as a relationship prevents it from being related to more than two things ๏Smells: •Lots of attribute-like properties •Use of relationships as starting point of queries ๏Entities hidden in verbs: •E.g. emailed, reviewed Thanks to Ian Robinson
  • 91. Example: Movie Reviews 81 name: Tobias name: Jonas title: The Hobbit title: The Matrix REVIEWED REVIEWED REVIEWED text: This is the ... source: amazon.com date: 20100515 text: When I saw ... source: imdb.com date: 20121218 text: My brother and ... source: filmreview.org date: 20121218 Person Person Movie Movie Thanks to Ian Robinson
  • 92. New Requirement: Comment on Reviews 82 ๏Allow users to comment on each others reivews ๏Not possible in this model, can’t connect a review to another entity name: Tobias name: Jonas title: The Hobbit title: The Matrix REVIEWED REVIEWED REVIEWED text: This is the ... source: amazon.com date: 20100515 text: When I saw ... source: imdb.com date: 20121218 text: My brother and ... source: filmreview.org date: 20121218 Person Person Movie Movie Thanks to Ian Robinson
  • 93. Revised model 83 name: Tobias name: Jonas title: The Hobbit title: The Matrix Person Person Movie Movie text: This is the ... source: amazon.com date: 20100515 text: When I saw ... source: imdb.com date: 20121218 text: My brother and ... source: filmreview.org date: 20121218 WROTE_REVIEW WROTE_REVIEW WROTE_REVIEW REVIEW_OFREVIEW_OFREVIEW_OF ReviewReviewReview Thanks to Ian Robinson
  • 94. Evolving your application and your domain 84
  • 95. Updating the domain model 85 ๏Query first,Whiteboard First, Examples first... ๏Update your application domain model to support both DB model versions •Write the new version only ๏Test, test, test ๏Re-deploy ๏Run background job to update from old model version to new •Can be as simple as a single query... (but can also be more complex) ๏Remove support for old model ๏Re-deploy
  • 96. Refactoring your graph Definition •Restructure graph without changing informational semantics Reasons •Improve design •Enhance performance •Accommodate new functionality •Enable iterative and incremental development of data model The common ones •Convert a Property to a Node •Convert a Relationship to a Node 86 Thanks to Ian Robinson
  • 97. Convert a Property to a Node // find nodes that have the currency property MATCH (t:Trade) WHERE has(t.currency) // limit the size of the transaction WITH t LIMIT {batchSize} // find or create the (unique) node for this currency MERGE (c:Currency{code:t.currency}) // create relationship to the currency node CREATE (t)-[:CURRENCY]->(c) // remove the property REMOVE t.currency // when the returned count is smaller then batchSize, // you are done RETURN count(t) AS numberRemoved 87 Thanks to Ian Robinson
  • 98. Convert a Relationship to a Node // find emailed relationships MATCH (a:User)-[r:EMAILED]->(b:User) // limit the size of each transaction WITH a, r, b LIMIT {batchSize} // create a new node and relationships for it CREATE (a)<-[:FROM]-(:Email{ content: r.content, title: t.title }) -[:TO]-> (b) // delete the old relationship DELETE r // when the returned count is smaller then batchSize, // you are done RETURN count(r) AS numberDeleted 88 Thanks to Ian Robinson
  • 100. Neo4j 2.0 90 ๏All about making it more convenient to model data ๏“Labels” for Nodes, enable you to model your types ๏Indexing performed by the database, automatically, based on Labels ๏Also adds user definable constraints, based on Labels ๏START clause is gone from Cypher, instead MATCH uses the schema information from labels used in your query to determine the best start points.
  • 101. Migrating to 2.0 91 ๏Test queries with new Neo4j version •Explicitly specify Cypher version for queries that fail (prefix with CYPHER 1.9 - this will work with existing db) ๏Redeploy app with queries known to work on both versions ๏Update the database - rolling with HA, downtime with single db ๏Very similar process for updating the domain model... •Create schema for your domain with indexes to replace your manual indexes •Make your write-queries add labels •Update all existing data: add labels •Change reads to use MATCH with labels instead of START •Drop old (manual) indexes
  • 103. Building apps with Graph Databases ๏Model for your Queries, draw on a Whiteboard, using Examples, avoid Redundancy, Thank You. ๏Use Cypher where possible, write Java extensions if needed for performance (frequently not needed - just update to next version) ๏Incremental modeling approach supported and pleasant! ๏Most Application Development Best Practices are the same! ๏Neo4j 2.0 makes modeling a whole lot nicer •makes Cypher complete - no need to index manually! 93