This talk covers Neo4j architecture basics, helping you match Neo4j with the right technical problem. It also provides guidance for success in production and where Neo4j fits in your enterprise architecture stack.
2. Key Takeaways
1. Neo4j architecture basics… to help you match up
Neo4j with the right technical problem
2. Some guidelines for success in production
3. Where Neo4j fits into your enterprise architecture
16. Graph
Graph Database
Five Key Sub-Patterns (Including SQL)
RDBMS
TabularAggregate Oriented (3)
Key-Value, Column-Family,
Document Database
Source: Martin Fowler NoSQL Distilled
Database Management Systems
17. Illustration by David Somerville based on the original by Hugh McLeod (@gapingvoid)
Connectedness
Latency &
Freshness
Batch-
Precompute
Real-Time
Important Dimensions in
Technology Selection
18. Illustration by David Somerville based on the original by Hugh McLeod (@gapingvoid)
RDBMS
&
Aggregate-
Oriented NoSQL
Hadoop /
MapReduce
|<———————- Graph Database & ———————>|
Graph Compute Engine
A View of the Data Management Portfolio
21. Recommendations
based on activity
from yesterday
Overnight/Intermittent
Loading and Calculations
Results in lag between activity
& knowledge response
System-wide local pre-calculations
are computationally inefficient
Real-Time Writes &
Writes
Up-to-the-moment freshness
“Just-in-time” processing
most efficient for “local” queries
Recommendations
that reflects your
latest activity
Batch Processing Real-Time Processing
22. Discrete Data
Minimally
connected data
Hadoop
Other NoSQL
Relational DBMS Graph Database
Connected Data
Focused on
Data Relationships
Architectures for Leveraging Connectedness
Designed for
Discrete Lookups &
Aggregation
Designed for Causality &
Pattern-Based Queries
Architecture tradeoffs:
- Data Model Richness for Volume
- Performant Insight Into Connections
- Data Trustability (ACID)
Architecture tradeoffs:
- Aggregation performance for
arbitrary hop performance
- “Infinite scale” for large scale index-
free relationship performance
26. Connectedness and Size of Data Set
ResponseTime
Relational and
Other NoSQL
Databases
0 to 2 hops
0 to 3 degrees
Thousands of connections
1000x
Advantage
Tens to hundreds of hops
Thousands of degrees
Billions of connections
Neo4j
“Minutes to
milliseconds”
#2 Benefit:
“Minutes to Milliseconds” Real-Time Query Performance
27. 27
Example HR Query in SQL The Same Query using Cypher
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
WHERE boss.name = “John Doe”
RETURN sub.name AS Subordinate,
count(report) AS Total
Project Impact
Less time writing queries
• More time understanding the answers
• Leaving time to ask the next question
Less time debugging queries:
• More time writing the next piece of code
• Improved quality of overall code base
Code that’s easier to read:
• Faster ramp-up for new project members
• Improved maintainability & troubleshooting
Benefit #3 of 3: Query Productivity
29. At Write Time:
data is connected
as it is stored
At Read Time:
Lightning-fast retrieval of data and relationships
via pointer chasing
Index free adjacency
Key Ingredient #1 of 3:
Graph Optimized Memory & Storage
32. “Why Neo4j”: What We Hear From Users
ACID Transactions
• ACID transactions with causal
consistency
• Neo4j Security Foundation delivers
enterprise-class security and control
Performance
• Index-free adjacency delivers millions
of hops per second
• In-memory pointer chasing for fast
query results
Agility
• Native property graph model
• Modify schema as business changes
without disrupting existing data
Developer Productivity
• Easy to learn, declarative openCypher
graph query language
• Procedural language extensions
• Open library of procedures and
functions APOC
• Neo4j support and training
• Worldwide developer community
… all backed by Neo’s track record of
leadership and product roadmap
Hardware Efficiency
• Native graph query processing and storage
requires 10x less hardware
• Index-free adjacency requires 10x less CPU
34. Confidential - Neo Technology, Inc.
#1: Get to know the
“Whole Product”
Cloud
IaaS, PaaSm, DBaaS
Marketplace
Companion Service
Education
Documents
Online Training
Classroom
Custom Onsite
34
OSS
Community
Foundations
LDBS, openCypher
Events
Forums
Add-Ons
Tech
Ecosystem
Tech Partners
Graph Solutions
Data Science
Architecture
Data Models
Partners
System Integrators
Trainers
OEMs
Commercial
Support
Technical Support
Packaged Services
Custom Services
43. R E P L I C A Q U E R I E S C O R E Q U E R I E S
Causal Clustering Architecture Optimizes for
Cost-Consistency at Query Time
Read
Any
43
Read
Your Own
Writes
Read
Any
Read
Your Own
Writes
Linearizable
(Future 3.x)
QUORATE
The Holy Grail
of Distributed
Systems
Q U E R Y C O S T
ENTERPRISE EDITION
46. Satisfy enterprise admin and database
security requirements
• Flexible authentication options
ActiveDirectory/LDAP or Native users
• Role-based Authorization
• List and kill running queries
• Access controls for User-Defined Procedures
Enables subgraph access control
• Query logging and Security event logging
Passes through originating end user
• Extendable Auth plugin Architecture
Kerberos support coming soon!
46
Enables
Sarbanes-Oxley,
HIPAA, PCI-DSS, et al
Neo4j Security Foundation
Enterprise-Class Security and Control
P R E D E F I N E D R O L E S
Privileges Reader Publisher Architect Admin
Change own password • • • •
Read data • • • •
View own details • • • •
Terminate own query • • • •
Write/update/delete data • • •
Manage index/constraints • •
Terminate others’ queries •
ENTERPRISE EDITION
55. Relational
Database
Good for:
• Well-understood data structures
that don’t change too frequently
A way of representing data
• Known problems involving
discrete parts of the data, or
minimal connectivity
DATA
56. Graph
Database
Relational
Database
A way of representing data
Good for:
• Dynamic systems: where the data
topology is difficult to predict
• Dynamic requirements:
the evolve with the business
• Problems where the relationships
in data contribute meaning & value
Good for:
• Well-understood data structures
that don’t change too frequently
• Known problems involving
discrete parts of the data, or
minimal connectivity
58. Graph is easy to learn, hard to master
• Common issues your team will hit
• Underestimate graph complexity
• Complaints of slow queries
• Undersized hardware, especially memory, but also CPU
• Ambitious number of future nodes
• Bad scaling topology / architecture assumptions
• Disappointing ‘Write’ speed
• Deep analytics mismatch
• You still need your 10,000 hours
• 8760 hours in a year, so depending on how long you sleep, 5-7 years.