In this talk Tareq will discuss graph solutions based on his experiences building a varied mix of graph-based systems. He will be sharing techniques and approaches that he has learned and will focus on a number of concepts that may be applied to a wider context.
2. About me
•
CTO/Principal Consultant at OpenCredo
•
Working with Neo4j for (almost) 3 years on a
number of different projects
•
Co-author of Neo4j in Action (Manning)
19. Domain-Centric
•
Well-defined data model
•
Data changes through user interactions
•
Flexible but predictable data structure(s)
•
Recommendation engines, social networks, etc…
•
Top-down design
20. Data-Centric
•
Complex connected data that typically models real
world networks
•
Integrated from a variety of different sources
•
Data can be unpredictable
•
Telco networks, utility networks, etc…
•
bottom-up design
22. How can I use the
information available in
my graph?
23. •
Search and pattern-matching
•
•
Graph algorithms
•
•
Find a recommendation based on behaviour
Shortest path, disconnected components
Optimisation
•
Maximise oil flow while minimising water
38. •
Start from an initial population of candidate solutions
(individuals or phenotypes), ideally random
•
Attribute a score each solution using a fitness function
•
•
The only place with specific business knowledge
Apply genetic operators to create a new generation
•
•
•
Cross-breeding to retain best characteristics from each
parent
Mutation to maintain diversity and to avoid converging to a
local optima too quickly
Stop when you want!
46. •
Don’t follow “best practices” blindly
•
For domain-centric applications you can use a
mapping framework, such as Spring Data Neo4j
•
For data-centric applications, you should stay as
close as possible to the graph model
•
In any case, don’t try to hide the graph!
50. •
Graph algorithms are typically complex
•
Knowledge of the domain can simplify queries and
traversals
•
Make Cypher queries as specific as possible
•
Take “shortcuts” when you know the domain
52. •
Break down problems into a small queries. Return
graph resources (or ids) to chain queries.
•
Robustness principal: “Be conservative in what you
do, be liberal in what you accept from others”
•
Use assertions as preconditions
•
Assertions document intent
•
Fail fast if data doesn’t match
54. •
Create a small data sets to capture the initial use
cases
•
Write simple unit tests using these datasets to
support design and implementation
•
These tests tend to become less useful when
requirements are better understood
•
Throw them away!
55. Move to a realistic
dataset as soon as
possible
56. •
A realistic data set
•
Should capture the complexity of the real data
•
Should be sufficiently large
•
Ideally based on production data
•
Write functional and integration tests against this
dataset
58. •
Graph data is inherently flexible and evolving
•
Queries need to be correct and sufficiently performant
•
Existing queries’s performance can degrade as the
underlying model changes
•
Assertions on timeouts should be part of the test suite
to detect loops and poor performance
•
JUnit’s @Test(timeout=5)
•
Spring’s @Timeout(value=5)