Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Neo4j
Theory and Practice
Tareq Abedrabbo
Graph Connect - 19/11/2013

About me
•

CTO/Principal Consultant at OpenCredo

•

Working with Neo4j for (almost) 3 years on a
number of different projects

•

Co-author of Neo4j in Action (Manning)

It’s for developers
designing and building
applications with Neo4j

It’s not a collection of war
stories but I will refer to
real-world examples

It is about sharing
thoughts and lessons
learnt in a useful way

“If I'm to believe Twitter, half of the
earth's population are importing
Wikipedia into Neo4j, for very
obscure reasons.”

•

What is Neo4j?

•

Approaching graph-based applications
•

Design

•

Implementation

•

Test

•

Use cases

•

Lessons learnt

Neo4j is a solid foundation
on which to build graphbased applications

How should I approach
graph-based
applications?

Is there a useful way to
categorise graph-based
applications?

Domain-Centric
•

Well-deﬁned data model

•

Data changes through user interactions

•

Flexible but predictable data structure(s)

•

Recommendation engines, social networks, etc…

•

Top-down design

Data-Centric
•

Complex connected data that typically models real
world networks

•

Integrated from a variety of different sources

•

Data can be unpredictable

•

Telco networks, utility networks, etc…

•

bottom-up design

Typically applications fall
somewhere between
these 2 types

How can I use the
information available in
my graph?

•

Search and pattern-matching
•

•

Graph algorithms
•

•

Find a recommendation based on behaviour

Shortest path, disconnected components

Optimisation
•

Maximise oil ﬂow while minimising water

Graphs are naturally
data-driven

Use case 1:
Network Impact Analysis

Requirement: Identify the
impact of failing
components

Requirement: Identify
interesting patterns, such
as single points of failure

Labelled property graph
is a natural ﬁt for the
model

Additional “dimensions” can be
added to capture abstract concepts:
network redundancy, load-balancing

Cypher queries are a
natural solution to delivering
the different requirements

Use case 2:
Oil ﬂow optimisation

Requirement: Identify
candidate conﬁgurations
to maximise ﬂow

Requirement: Identify the
most practical and valuable
adjustments to the network

Simply connected graph
with complex
components

•

Start from an initial population of candidate solutions
(individuals or phenotypes), ideally random

•

Attribute a score each solution using a ﬁtness function
•

•

The only place with speciﬁc business knowledge

Apply genetic operators to create a new generation
•

•

•

Cross-breeding to retain best characteristics from each
parent
Mutation to maintain diversity and to avoid converging to a
local optima too quickly

Stop when you want!

Is this even a use
case for Neo4j?

Persist and share
calculated solutions

Use Cypher queries to
interrogate solutions

•

Don’t follow “best practices” blindly

•

For domain-centric applications you can use a
mapping framework, such as Spring Data Neo4j

•

For data-centric applications, you should stay as
close as possible to the graph model

•

In any case, don’t try to hide the graph!

!

•

Expressive

•

Readable

•

Maintainable

•

Performant

•

Cypher + the web console is the quickest way to
experiment and to prototype solutions

Manage complexity
with domain knowledge

•

Graph algorithms are typically complex

•

Knowledge of the domain can simplify queries and
traversals
•

Make Cypher queries as speciﬁc as possible

•

Take “shortcuts” when you know the domain

Write robust and
ﬂexible code

•

Break down problems into a small queries. Return
graph resources (or ids) to chain queries.

•

Robustness principal: “Be conservative in what you
do, be liberal in what you accept from others”

•

Use assertions as preconditions
•

Assertions document intent

•

Fail fast if data doesn’t match

Start with a
representative dataset

•

Create a small data sets to capture the initial use
cases

•

Write simple unit tests using these datasets to
support design and implementation

•

These tests tend to become less useful when
requirements are better understood

•

Throw them away!

Move to a realistic
dataset as soon as
possible

•

A realistic data set
•

Should capture the complexity of the real data

•

Should be sufﬁciently large

•

Ideally based on production data

•

Write functional and integration tests against this
dataset

•

Graph data is inherently ﬂexible and evolving

•

Queries need to be correct and sufﬁciently performant

•

Existing queries’s performance can degrade as the
underlying model changes

•

Assertions on timeouts should be part of the test suite
to detect loops and poor performance
•

JUnit’s @Test(timeout=5)

•

Spring’s @Timeout(value=5)

Links
•

Twitter: @tareq_abedrabbo

•

Blog: http://www.terminalstate.net

•

OpenCredo: http://www.opencredo.com
Thank you!

Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (17)

Similaire à Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Similaire à Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013 (20)

Plus de Neo4j

Plus de Neo4j (20)

Dernier

Dernier (20)

Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013