SlideShare une entreprise Scribd logo
1  sur  71
Graph Databases
&
Neo4J
Girish Khanzode
Graph Databases
• Graph Based NoSQL Database
• Property Graph Model
• Neo4j
• Noe4j Architecture
• Data Storage
• Programmatic Data Access
• Core API
• Lucene
• Auto Index lifecycle
• Traversers API
• Cypher
• Graph Algorithms
• Neo4j HA
• Cache Sharding
• References
Graphs
• A collection nodes (things) and edges (relationships) that
connect pairs of nodes
– Suitable for any data that is related
• Can attach properties (key-value pairs) on nodes and
relationships
• Relationships connect two nodes and both nodes and
relationships can hold an arbitrary amount of key-value pairs
Graph Relations are Universal
Graph
Graphs
• Well-understood patterns and algorithms
– Studied since Leonard Euler's 7 Bridges (1736)
– Codd's Relational Model (1970)
• Knowledge graph - beyond links, search is smarter when considering how things
are related
• Facebook graph search – people interested in finding things in their part of the
world
• Bing + Britannica: referencing and cross-referencing
• People - relationships to people, to organizations, to places, to things - personal
graph
A Graph Database
• Relationships are first citizens
• NoSQL database optimized for connected data
– Social networking, logistics networks, recommendation engines
– Relationships are as important as the records
– 1000 times faster than RDBMS for connected data
• Uses graph structures with nodes, edges and properties to store data
• Open source graph databases - Neo4j, InfiniteGraph, InfoGrid,OrientDB
• Very fast querying across records
Graph Database
A Graph Database
• Transactional with the usual operations
• RDBMS - can tell sales in last year
• Graph database – can tell customer which book to buy next
• Index-free adjacency
– Every node is a pointer to its adjacent element
• Edges hold most of the important information and relations
– nodes to other nodes
– nodes to properties
Graph Based NoSQL Database
• No rigid format of SQL or the tables and columns representation
• Uses a flexible graphical representation - addresses scalability concerns
• Data can be easily transformed from one model to the other using a
graph based NoSQL database
• Nodes are organised by some relationships with one another represented
by edges between the nodes
• Both nodes and the relationships have some defined properties
Graph Based NoSQL Database
• Labelled, directed, attributed multi-graph - Graphs contains nodes which
are labelled properly with some properties and these nodes have some
relationship with one another which is shown by the directional edges
• While relational database models can replicate the graphical ones, the
edge would require a join which is a costly proposition
Advantages
• Easier Relationships Analysis
• Very fast for associative data sets
– Like social networks
• Map more directly to object oriented applications
– Object classification and Parent->Child relationships
Disadvantages
• If data is just tabular with not much relationship between the
data, graph databases do not fare well
• OLAP support for graph databases not mature
Performance Experiment
• Compute social network path exists
• 1000 persons
• Average 50 friends per person
• pathExists(a, b) limited to depth 4
# persons query time
Relational
database
1000 2000ms
Neo4j 1000 2ms
Neo4j 1000000 2ms
Property Graph Model
name: the Doctor
age: 907
species:Time Lord
first name: Rose
late name:Tyler
vehicle: Skoda
model:Type 40
Graphs -Whiteboard-friendly
• No decomposition, ER design, normalization / de-
normalization as needed with RDBMS
Neo4j
• A Graph Database
• A Property Graph containing Nodes, Relationships with Properties on
both
• Manage complex, highly connected data
• Scalable - High-performance with High-Availability
– Traverse 1,000,000+ relationships / second on commodity hardware
• Server with REST API, or Embeddable on the JVM
Neo4j
• Full ACID transactions
• Schema free, bottom-up data model design
• Stable
• Easier than RDBMS since no need for normalization
• Implemented in Java
• Open Source
Neo4j
• Schema free – Data does not have to adhere to any convention
• Support for wide variety of languages - Java, Python, Perl, Scala,Cypher
• A graph database can be thought of as a key-value store, with full support
for relationships.
• Graph databases don’t avoid design efforts
• Good design still requires effort
Why Neo4J?
• The internet is a network of pages connected to each other.
What is a better way to model that than in graphs?
• No time lost fighting with less expressive data-stores
• Easy to implement experimental features
• A single instance of Neo4j can house at most 34 billion nodes,
34 billion relationships and 68 billion properties
Core API
REST API
JVM Language Bindings
Traversal Framework
Caches
Memory-Mapped (N)IO
Filesystem
Java Ruby Clojure…
Graph Matching
Noe4j Architecture
Software Architecture
Data Storage
• Neo4j stores graph data in a number of different store files
• Each store file contains the data for a specific part of the
graph
– neostore.nodestore.db
– neostore.relationshipstore.db
– neostore.propertystore.db
– neostore.propertystore.db.index
– neostore.propertystore.db.strings
– neostore.propertystore.db.arrays
Node Store
• Size: 9 bytes
– 1st byte - in-use flag
– Next 4 bytes - ID of first relationship
– Last 4 bytes - ID of first property of node
• Fixed size records enable fast lookups
Relationship store
• neostore.relationshipstore.db
• Size: 33 bytes
• 1st byte - In use flag
• Next 8 bytes - IDs of the nodes at the start and end of the relationship
• 4 bytes - Pointer to the relationship type
• 16 bytes - pointers for the next and previous relationship records for each of the start and end nodes. (
property chain)
• 4 bytes - next property id
Relationships Storage
Data Size
nodes 235 (∼ 34 billion)
relationships 235 (∼ 34 billion)
properties 236 to 238 depending on property types (maximum ∼ 274
billion, always at least ∼ 68 billion)
relationship
types
215 (∼ 32 000)
Neo4j API – LogicalView
Programmatic Data Access
• JavaAPIs - JVM languages bind to sameAPIs
• JRuby, Jython, Clojure, Scala…
• Manage nodes and relationships
• Indexing – find data without traversal
• Traversing
• Path finding
• Pattern matching
Core API
• Deals with graphs in terms of their fundamentals
• Nodes - properties
– KV Pairs
• Relationships
– Start node
– End node
– Properties
• KV Pairs
Create Node
GraphDatabaseService db = new EmbeddedGraphDatabase("/tmp/neo");
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("character", "the Doctor");
tx.success();
} finally
{
tx.finish();
}
Create Relationships
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("character", "The Doctor");
Node susan = db.createNode();
susan.setProperty("firstname", "Susan");
susan.setProperty("lastname", "Campbell");
susan.createRelationshipTo(theDoctor,DynamicRelationshipType.withName("COMPANION_OF"));
tx.success();
} finally
{
tx.finish();
}
Index a Graph
• Graphs themselves are indexes
• Can create short-cuts to well-known nodes
• In program, keep a reference to any interesting node
• Indexes offer flexibility in what constitutes an “interesting
node”
Lucene
• The default index implementation for Neo4j
– Default implementation for IndexManager
• Supports many indexes per database
• Each index supports nodes or relationships
• Supports exact and regex-based matching
• Supports scoring
– Number of hits in the index for a given item
– Great for recommendations
Create a Node Index
GraphDatabaseService db = …
Index<Node> planets = db.index().forNodes("planets");
Type
Type
Indexname
CreateOR
retrieve
Create a Relationship Index
GraphDatabaseService db = …
Index<Relationship> enemies = db.index().forRelationships("enemies");
Type
Type
Indexname
CreateOR
retrieve
Exact Matches
GraphDatabaseService db = …
Index<Node> actors = doctorWhoDatabase.index().forNodes("actors");
Node rogerDelgado = actors.get("actor", "Roger Delgado“).getSingle();
Valueto
match
Firstmatch
only
Key
Query Matches
GraphDatabaseService db = …
Index<Node> species = doctorWhoDatabase.index().forNodes("species");
IndexHits<Node> speciesHits = species.query("species“,"S*n");
Query
Key
Transactions to Mutate Indexes
• Mutating access is still protected by transactions which cover both index and graph
GraphDatabaseService db = …
Transaction tx = db.beginTx();
try {
Node nixon= db.createNode();
nixon("character", "Richard Nixon");
db.index().forNodes("characters").add(nixon,
"character“, nixon.getProperty("character"));
tx.success();
} finally {
tx.finish();
}
Auto Index lifecycle
• Auto Index - stays consistent with the graph data
• Specify the property name to index while creation
• If node/relationship or property is removed from the graph it is removed
from the index
• If database started with auto indexing enabled but different auto indexed
properties than the last run, then already auto-indexed entities will be
deleted as they are worked upon
• Re-indexing is a manual
– Existing properties not indexed unless touched
Auto Index lifecycle
AutoIndexer<Node> nodeAutoIndex = graphDb.index().getNodeAutoIndexer();
nodeAutoIndex.startAutoIndexingProperty("species");
nodeAutoIndex.setEnabled( true );
ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
Node -> Relationship Indexes Supported
Core API
• Basic (nodes, relationships)
• Fast
• Imperative
• Flexible - Easily intermix mutating operations
Traversers API
• Mechanisms to query graph navigating from starting node to
related nodes according to algorithm to get answers
• Expressive
• Fast
• Declarative (mostly)
• Opinionated
Cypher - A Graph Query Language
• Query Language for Neo4j
• A declarative graph pattern matching language
– SQL for graphs
– Tabular results
• aggregation, ordering and limits
• Mutating operations
• CRUD
• Easy to formulate queries based on relationships
• Many features stem from improving pain points of SQL like join tables
Cypher - A Graph Query Language
Cypher
Query
• Query:
MATCH(n:Crew)-[r:KNOWS*]-m
WHERE n.name = ‘Neo’
RETUEN nAS Nep,r,m
Operations
• Aggregation - COUNT, SUM, AVG, MAX, MIN, COLLECT
• Where clause
start doctor=node:characters(name = 'Doctor‘)
match (doctor)<-[:PLAYED]-(actor)-[:APPEARED_IN]->(episode) where actor.actor = 'Tom
Baker‘ and episode.title =~ /.*Dalek.*/
return episode.title
• Ordering
– order by <property>
– order by <property> desc
Graph Algorithms
• Neo4j has built-in algorithms
• Callable through JVM and REST APIs
• Higher level of abstraction
• Graph Matching
– Look for patterns in a data set - retail analytics
– Higher-level abstraction than raw traversers
• REST API
– Access the server
• Binary protocol
– JSON as default format
Neo4j HA - High Availability Cluster
• A scalability package known as high availability or HA that
uses a master-slave cluster architecture
– Full data redundancy
– Service fault tolerance
– Linear read scalability
– Master-slave replication
• Single data-centre or global zones
– tolerance for high-latency
Neo4j HA
• Redundancy - improved uptime
– automatic failover
• In a Neo4j HA cluster the full graph is replicated to each instance in the
cluster.
• Full dataset is replicated across the entire cluster to each server
• Read operations can be done locally on each slave
• Read capacity of the HA cluster increases linearly with the number of
servers
Neo4j HA
HA Cluster Architecture
• Cluster performs automatic master election
• Supports master-slave replication for clustering and DR
across sites
HA Cluster Architecture
Write to a Master
• All write operations are co-ordinated by the master
• Writes to the master are fast
• Slaves eventually catch up
Write to a Master
Write to a Slave
• Writes to a slave cause a synchronous transaction
with the master
• Other slaves eventually catch up
Write to a Slave
Server Overload Problem
• Unlike other classes of NOSQL database, a graph does not
have predictable lookup since it is a highly mutable structure
• We want to co-locate related nodes for traversal
performance, but we don’t want to place so many connected
nodes on the same database that it becomes heavily loaded
• The black-hole problem - popular nodes get lumped together
on a single instance, but there is low point cut
Server Overload Problem
Thinly Spread Network
• The opposite is also true, that we don’t want too widely connected nodes
across different database instances since it will incur a substantial
performance penalty at runtime as traversals cross the (relatively latent)
network
• Load-leveling alone can lead to many relationships crossing instances
• These are very expensive to traverse, networks are many orders of
magnitude slower than in-memory traversals
Thinly Spread Network
Minimal Point Cut
• The best approach is to balance a graph across database instances by
creating a minimum point cut for a graph, where graph nodes are placed
such that there are few relationships that span shards
• Good strategy is to take a local view of the graph (no global locks) and
work incrementally (short bursts)
• Take into account use patterns
• Unlike other NoSQL stores, graph s are not predictable so we can not use
techniques like consistent hashing for scale out
Minimal Point Cut
Cache Sharding
• A strategy for large data sets of terabyte scale
• Mandates consistent request routing
• For instance, requests for user A are always sent to server 1,
while requests for user B are always sent to server 2 and so on
• The key assumption is that requests for user A typically touch
parts of the graph around user A, such has his or her friends,
preferences, likes and so on
Cache Sharding
• This means that the neighbourhood of the graph around user
A will be cached on server 1, while the neighbourhood around
user B will be cached on server 2
• By employing consistent routing of requests, the caches of all
servers in the HA cluster can be utilized maximally
• Strategy is highly effective for managing a large graph that
does not fit in RAM
Consistent Routing
• Always try to route related requests to the same server to hopefully
benefit from warm caches
Domain Specific Sharding
• No easy to shard graphs like documents or KV stores
• High performance graph databases limited in terms of data set size that
can be handled by a single machine
• Use replicas to speed up and improve availability but limits data set size
limited to a single machine’s disk/memory
• No perfect algorithm exists but domain insight of expert helps
Domain Specific Sharding
• Some domains can shard easily (geo, most web apps) using consistent
routing approach and cache sharding
– Geo - where the connections between cities are few compared with the
connections within the cities. So can place cities or countries on different
nodes
• Eventually (Petabytes) level data cannot be replicated practically
• Need to shard data across machines
References
1. http://www.neo4j.org
2. http://www.neo4j.org/learn/cypher
3. Bachman, Michal (2013)GraphAware -TowardsOnline Analytical Processing in Graph Databases
http://graphaware.com/assets/bachman-msc-thesis.pdf
4. Hunger, Michael (2012). Cypher and Neo4j http://vimeo.com/83797381
5. Mistry, Deep Neo4j: A Developer’s Perspective
http://osintegrators.com/opensoftwareintegrators%7Cneo4jadevelopersperspective
6. MapGraph:A High LevelAPI for Fast Development of High Performance GraphAnalytics on GPUs
7. Parallel Breadth First Search on GPU Clusters
8. DB-Engines Ranking of Graph DBMS
ThankYou
Check Out My LinkedIn Profile at
https://in.linkedin.com/in/girishkhanzode

Contenu connexe

Tendances

Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to CypherNeo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databasesArangoDB Database
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentationjexp
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to GraphNeo4j
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph SchemaJoshua Shinavier
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceNeo4j
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performanceMydbops
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j PresentationMax De Marzi
 

Tendances (20)

Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic training
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph Schema
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data ScienceScaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performance
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j Presentation
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 

Similaire à Graph Databases & Neo4J Overview

Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training IntroductionMax De Marzi
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databasesthai
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesXavier Gorse
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4jSina Khorami
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms_mdev_
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jDatabricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jFred Madrid
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
DBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionDBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionRajeev Srivastava
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDBArpit Poladia
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011jexp
 
Demo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisDemo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisNeo4j
 
Intro to Graphs for Fedict
Intro to Graphs for FedictIntro to Graphs for Fedict
Intro to Graphs for FedictRik Van Bruggen
 

Similaire à Graph Databases & Neo4J Overview (20)

Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Ciel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnellesCiel, mes données ne sont plus relationnelles
Ciel, mes données ne sont plus relationnelles
 
Graph Database and Neo4j
Graph Database and Neo4jGraph Database and Neo4j
Graph Database and Neo4j
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
DBMS & Data Models - In Introduction
DBMS & Data Models - In IntroductionDBMS & Data Models - In Introduction
DBMS & Data Models - In Introduction
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 
Demo Neo4j - Big Data Paris
Demo Neo4j - Big Data ParisDemo Neo4j - Big Data Paris
Demo Neo4j - Big Data Paris
 
Intro to Graphs for Fedict
Intro to Graphs for FedictIntro to Graphs for Fedict
Intro to Graphs for Fedict
 

Plus de Girish Khanzode (12)

Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Data Visulalization
Data VisulalizationData Visulalization
Data Visulalization
 
IR
IRIR
IR
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
NLP
NLPNLP
NLP
 
NLTK
NLTKNLTK
NLTK
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Hadoop
HadoopHadoop
Hadoop
 
Language R
Language RLanguage R
Language R
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
Funtional Programming
Funtional ProgrammingFuntional Programming
Funtional Programming
 

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Graph Databases & Neo4J Overview

  • 2. Graph Databases • Graph Based NoSQL Database • Property Graph Model • Neo4j • Noe4j Architecture • Data Storage • Programmatic Data Access • Core API • Lucene • Auto Index lifecycle • Traversers API • Cypher • Graph Algorithms • Neo4j HA • Cache Sharding • References
  • 3. Graphs • A collection nodes (things) and edges (relationships) that connect pairs of nodes – Suitable for any data that is related • Can attach properties (key-value pairs) on nodes and relationships • Relationships connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs
  • 6. Graphs • Well-understood patterns and algorithms – Studied since Leonard Euler's 7 Bridges (1736) – Codd's Relational Model (1970) • Knowledge graph - beyond links, search is smarter when considering how things are related • Facebook graph search – people interested in finding things in their part of the world • Bing + Britannica: referencing and cross-referencing • People - relationships to people, to organizations, to places, to things - personal graph
  • 7. A Graph Database • Relationships are first citizens • NoSQL database optimized for connected data – Social networking, logistics networks, recommendation engines – Relationships are as important as the records – 1000 times faster than RDBMS for connected data • Uses graph structures with nodes, edges and properties to store data • Open source graph databases - Neo4j, InfiniteGraph, InfoGrid,OrientDB • Very fast querying across records
  • 9. A Graph Database • Transactional with the usual operations • RDBMS - can tell sales in last year • Graph database – can tell customer which book to buy next • Index-free adjacency – Every node is a pointer to its adjacent element • Edges hold most of the important information and relations – nodes to other nodes – nodes to properties
  • 10. Graph Based NoSQL Database • No rigid format of SQL or the tables and columns representation • Uses a flexible graphical representation - addresses scalability concerns • Data can be easily transformed from one model to the other using a graph based NoSQL database • Nodes are organised by some relationships with one another represented by edges between the nodes • Both nodes and the relationships have some defined properties
  • 11. Graph Based NoSQL Database • Labelled, directed, attributed multi-graph - Graphs contains nodes which are labelled properly with some properties and these nodes have some relationship with one another which is shown by the directional edges • While relational database models can replicate the graphical ones, the edge would require a join which is a costly proposition
  • 12. Advantages • Easier Relationships Analysis • Very fast for associative data sets – Like social networks • Map more directly to object oriented applications – Object classification and Parent->Child relationships
  • 13. Disadvantages • If data is just tabular with not much relationship between the data, graph databases do not fare well • OLAP support for graph databases not mature
  • 14. Performance Experiment • Compute social network path exists • 1000 persons • Average 50 friends per person • pathExists(a, b) limited to depth 4 # persons query time Relational database 1000 2000ms Neo4j 1000 2ms Neo4j 1000000 2ms
  • 15. Property Graph Model name: the Doctor age: 907 species:Time Lord first name: Rose late name:Tyler vehicle: Skoda model:Type 40
  • 16. Graphs -Whiteboard-friendly • No decomposition, ER design, normalization / de- normalization as needed with RDBMS
  • 17. Neo4j • A Graph Database • A Property Graph containing Nodes, Relationships with Properties on both • Manage complex, highly connected data • Scalable - High-performance with High-Availability – Traverse 1,000,000+ relationships / second on commodity hardware • Server with REST API, or Embeddable on the JVM
  • 18. Neo4j • Full ACID transactions • Schema free, bottom-up data model design • Stable • Easier than RDBMS since no need for normalization • Implemented in Java • Open Source
  • 19. Neo4j • Schema free – Data does not have to adhere to any convention • Support for wide variety of languages - Java, Python, Perl, Scala,Cypher • A graph database can be thought of as a key-value store, with full support for relationships. • Graph databases don’t avoid design efforts • Good design still requires effort
  • 20. Why Neo4J? • The internet is a network of pages connected to each other. What is a better way to model that than in graphs? • No time lost fighting with less expressive data-stores • Easy to implement experimental features • A single instance of Neo4j can house at most 34 billion nodes, 34 billion relationships and 68 billion properties
  • 21. Core API REST API JVM Language Bindings Traversal Framework Caches Memory-Mapped (N)IO Filesystem Java Ruby Clojure… Graph Matching Noe4j Architecture
  • 23. Data Storage • Neo4j stores graph data in a number of different store files • Each store file contains the data for a specific part of the graph – neostore.nodestore.db – neostore.relationshipstore.db – neostore.propertystore.db – neostore.propertystore.db.index – neostore.propertystore.db.strings – neostore.propertystore.db.arrays
  • 24. Node Store • Size: 9 bytes – 1st byte - in-use flag – Next 4 bytes - ID of first relationship – Last 4 bytes - ID of first property of node • Fixed size records enable fast lookups
  • 25. Relationship store • neostore.relationshipstore.db • Size: 33 bytes • 1st byte - In use flag • Next 8 bytes - IDs of the nodes at the start and end of the relationship • 4 bytes - Pointer to the relationship type • 16 bytes - pointers for the next and previous relationship records for each of the start and end nodes. ( property chain) • 4 bytes - next property id
  • 27. Data Size nodes 235 (∼ 34 billion) relationships 235 (∼ 34 billion) properties 236 to 238 depending on property types (maximum ∼ 274 billion, always at least ∼ 68 billion) relationship types 215 (∼ 32 000)
  • 28. Neo4j API – LogicalView
  • 29. Programmatic Data Access • JavaAPIs - JVM languages bind to sameAPIs • JRuby, Jython, Clojure, Scala… • Manage nodes and relationships • Indexing – find data without traversal • Traversing • Path finding • Pattern matching
  • 30. Core API • Deals with graphs in terms of their fundamentals • Nodes - properties – KV Pairs • Relationships – Start node – End node – Properties • KV Pairs
  • 31. Create Node GraphDatabaseService db = new EmbeddedGraphDatabase("/tmp/neo"); Transaction tx = db.beginTx(); try { Node theDoctor = db.createNode(); theDoctor.setProperty("character", "the Doctor"); tx.success(); } finally { tx.finish(); }
  • 32. Create Relationships Transaction tx = db.beginTx(); try { Node theDoctor = db.createNode(); theDoctor.setProperty("character", "The Doctor"); Node susan = db.createNode(); susan.setProperty("firstname", "Susan"); susan.setProperty("lastname", "Campbell"); susan.createRelationshipTo(theDoctor,DynamicRelationshipType.withName("COMPANION_OF")); tx.success(); } finally { tx.finish(); }
  • 33. Index a Graph • Graphs themselves are indexes • Can create short-cuts to well-known nodes • In program, keep a reference to any interesting node • Indexes offer flexibility in what constitutes an “interesting node”
  • 34. Lucene • The default index implementation for Neo4j – Default implementation for IndexManager • Supports many indexes per database • Each index supports nodes or relationships • Supports exact and regex-based matching • Supports scoring – Number of hits in the index for a given item – Great for recommendations
  • 35. Create a Node Index GraphDatabaseService db = … Index<Node> planets = db.index().forNodes("planets"); Type Type Indexname CreateOR retrieve
  • 36. Create a Relationship Index GraphDatabaseService db = … Index<Relationship> enemies = db.index().forRelationships("enemies"); Type Type Indexname CreateOR retrieve
  • 37. Exact Matches GraphDatabaseService db = … Index<Node> actors = doctorWhoDatabase.index().forNodes("actors"); Node rogerDelgado = actors.get("actor", "Roger Delgado“).getSingle(); Valueto match Firstmatch only Key
  • 38. Query Matches GraphDatabaseService db = … Index<Node> species = doctorWhoDatabase.index().forNodes("species"); IndexHits<Node> speciesHits = species.query("species“,"S*n"); Query Key
  • 39. Transactions to Mutate Indexes • Mutating access is still protected by transactions which cover both index and graph GraphDatabaseService db = … Transaction tx = db.beginTx(); try { Node nixon= db.createNode(); nixon("character", "Richard Nixon"); db.index().forNodes("characters").add(nixon, "character“, nixon.getProperty("character")); tx.success(); } finally { tx.finish(); }
  • 40. Auto Index lifecycle • Auto Index - stays consistent with the graph data • Specify the property name to index while creation • If node/relationship or property is removed from the graph it is removed from the index • If database started with auto indexing enabled but different auto indexed properties than the last run, then already auto-indexed entities will be deleted as they are worked upon • Re-indexing is a manual – Existing properties not indexed unless touched
  • 41. Auto Index lifecycle AutoIndexer<Node> nodeAutoIndex = graphDb.index().getNodeAutoIndexer(); nodeAutoIndex.startAutoIndexingProperty("species"); nodeAutoIndex.setEnabled( true ); ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex(); Node -> Relationship Indexes Supported
  • 42. Core API • Basic (nodes, relationships) • Fast • Imperative • Flexible - Easily intermix mutating operations
  • 43. Traversers API • Mechanisms to query graph navigating from starting node to related nodes according to algorithm to get answers • Expressive • Fast • Declarative (mostly) • Opinionated
  • 44. Cypher - A Graph Query Language • Query Language for Neo4j • A declarative graph pattern matching language – SQL for graphs – Tabular results • aggregation, ordering and limits • Mutating operations • CRUD • Easy to formulate queries based on relationships • Many features stem from improving pain points of SQL like join tables
  • 45. Cypher - A Graph Query Language
  • 48. Operations • Aggregation - COUNT, SUM, AVG, MAX, MIN, COLLECT • Where clause start doctor=node:characters(name = 'Doctor‘) match (doctor)<-[:PLAYED]-(actor)-[:APPEARED_IN]->(episode) where actor.actor = 'Tom Baker‘ and episode.title =~ /.*Dalek.*/ return episode.title • Ordering – order by <property> – order by <property> desc
  • 49. Graph Algorithms • Neo4j has built-in algorithms • Callable through JVM and REST APIs • Higher level of abstraction • Graph Matching – Look for patterns in a data set - retail analytics – Higher-level abstraction than raw traversers • REST API – Access the server • Binary protocol – JSON as default format
  • 50. Neo4j HA - High Availability Cluster • A scalability package known as high availability or HA that uses a master-slave cluster architecture – Full data redundancy – Service fault tolerance – Linear read scalability – Master-slave replication • Single data-centre or global zones – tolerance for high-latency
  • 51. Neo4j HA • Redundancy - improved uptime – automatic failover • In a Neo4j HA cluster the full graph is replicated to each instance in the cluster. • Full dataset is replicated across the entire cluster to each server • Read operations can be done locally on each slave • Read capacity of the HA cluster increases linearly with the number of servers
  • 53. HA Cluster Architecture • Cluster performs automatic master election • Supports master-slave replication for clustering and DR across sites
  • 55. Write to a Master • All write operations are co-ordinated by the master • Writes to the master are fast • Slaves eventually catch up
  • 56. Write to a Master
  • 57. Write to a Slave • Writes to a slave cause a synchronous transaction with the master • Other slaves eventually catch up
  • 58. Write to a Slave
  • 59. Server Overload Problem • Unlike other classes of NOSQL database, a graph does not have predictable lookup since it is a highly mutable structure • We want to co-locate related nodes for traversal performance, but we don’t want to place so many connected nodes on the same database that it becomes heavily loaded • The black-hole problem - popular nodes get lumped together on a single instance, but there is low point cut
  • 61. Thinly Spread Network • The opposite is also true, that we don’t want too widely connected nodes across different database instances since it will incur a substantial performance penalty at runtime as traversals cross the (relatively latent) network • Load-leveling alone can lead to many relationships crossing instances • These are very expensive to traverse, networks are many orders of magnitude slower than in-memory traversals
  • 63. Minimal Point Cut • The best approach is to balance a graph across database instances by creating a minimum point cut for a graph, where graph nodes are placed such that there are few relationships that span shards • Good strategy is to take a local view of the graph (no global locks) and work incrementally (short bursts) • Take into account use patterns • Unlike other NoSQL stores, graph s are not predictable so we can not use techniques like consistent hashing for scale out
  • 65. Cache Sharding • A strategy for large data sets of terabyte scale • Mandates consistent request routing • For instance, requests for user A are always sent to server 1, while requests for user B are always sent to server 2 and so on • The key assumption is that requests for user A typically touch parts of the graph around user A, such has his or her friends, preferences, likes and so on
  • 66. Cache Sharding • This means that the neighbourhood of the graph around user A will be cached on server 1, while the neighbourhood around user B will be cached on server 2 • By employing consistent routing of requests, the caches of all servers in the HA cluster can be utilized maximally • Strategy is highly effective for managing a large graph that does not fit in RAM
  • 67. Consistent Routing • Always try to route related requests to the same server to hopefully benefit from warm caches
  • 68. Domain Specific Sharding • No easy to shard graphs like documents or KV stores • High performance graph databases limited in terms of data set size that can be handled by a single machine • Use replicas to speed up and improve availability but limits data set size limited to a single machine’s disk/memory • No perfect algorithm exists but domain insight of expert helps
  • 69. Domain Specific Sharding • Some domains can shard easily (geo, most web apps) using consistent routing approach and cache sharding – Geo - where the connections between cities are few compared with the connections within the cities. So can place cities or countries on different nodes • Eventually (Petabytes) level data cannot be replicated practically • Need to shard data across machines
  • 70. References 1. http://www.neo4j.org 2. http://www.neo4j.org/learn/cypher 3. Bachman, Michal (2013)GraphAware -TowardsOnline Analytical Processing in Graph Databases http://graphaware.com/assets/bachman-msc-thesis.pdf 4. Hunger, Michael (2012). Cypher and Neo4j http://vimeo.com/83797381 5. Mistry, Deep Neo4j: A Developer’s Perspective http://osintegrators.com/opensoftwareintegrators%7Cneo4jadevelopersperspective 6. MapGraph:A High LevelAPI for Fast Development of High Performance GraphAnalytics on GPUs 7. Parallel Breadth First Search on GPU Clusters 8. DB-Engines Ranking of Graph DBMS
  • 71. ThankYou Check Out My LinkedIn Profile at https://in.linkedin.com/in/girishkhanzode