SlideShare une entreprise Scribd logo
1  sur  25
Graph Databases
Karol Grzegorczyk
June 10, 2014
2/25
Graph Theory
Seven Bridges of Königsberg problem
defined by Leonhard Euler in 1735
How to find a walk through the city that would
cross each bridge once and only once?
[© Google]
Euler proved that it is impossible to solve
this problem!
G = (V, E)
E {V × V}⊆
3/25
Storing Connected Data in a Relational Database
● Relationships do exist in the relational databases, but only as a means of joins and joining tables
● Logically, join crates a Cartesian product of tables
● Operations of relational databases are index-intensive. Retrieval based on an index is fast, but not
with a constant time (most often O(log 2 n))
● Traversal queries require hierarchical joins, which are costly. Deep traversal queries are
infeasible. Execution time increases exponentially with a depth of a join.
● For a given SQL query, RDBMS creates an in-memory graph data structure.
● Often relational database are normalized in order to efficiently organize data in a database.
● Normalization increases number of joins needed to query the database. Denormalization can be a
partial solution.
4/25
Database normalization
● Database normalization is the process of organizing the fields and tables of a relational database to
minimize redundancy.
– Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining
relationships between them.
● Normal forms
– The first normal form (each attribute contains only atomic values)
– The second normal form (each non primary key attribute is dependent on the whole primary key)
– The third normal form (each non primary key attribute is dependent on nothing but the primary key)
● A relational database table is often described as "normalized" if it is in the 3NF
● When a database is intended for OLAP rather than OLTP, it is topically denormalized.
● Denormalization is the process of attempting to optimize the read performance of a database by
adding redundant data or by grouping data
● Examples of denormalization techniques:
– Materialised views
– Star schemas
– OLAP cubes
5/25
Graph Database Highlights
● Graph data stores provide index-free adjacency resulting in a much better performance, if
compared to traditional RDBMS
● Designed predominantly for traversal performance and executing graph algorithms
● Graph database is more natural, direct representation of a domain than RDBMS (no need for
junction tables)
● There is no need for joining tables because the data structure is already “joined” by the edges
that are defined.
● In graph databases denormalization is not needed!
● The interesting thing about graph diagrams is that they tend to contain specific instances of
nodes and relationships, rather than classes or archetypes.
● The main purpose of Graph Databases is analysis and visualization of graphical data.
6/25
Graph Database Models
● The Property Graph Model
– Model is built of nodes and relationships
– Nodes contain key-value properties. Sometimes relationships as well.
– Relationships are named and directed, and always have a start and end node
● Hypergraphs
– Generalization of a graph model.
– A relationship can have any number of nodes at either end of a relationship (many-to-
many relationships)
● Triple stores
– A triple expresses a relationship between two resources.
– The triple is a subject-predicate-object data structure, e.g. Fred likes ice cream
7/25
Triple stores
● The Resource Description Framework (RDF) is a framework for expressing
information about resources.
● Resources can be anything, including documents, people, physical objects, and
abstract concepts.
● RDF is intended for situations in which information on the Web needs to be processed
by applications, rather than being only displayed to people.
● RDF is a building block of the Semantic Web movement.
● RDF is a set of W3C specifications
– SPARQL - SPARQL Protocol and RDF Query Language
● Disadvantages
– Lack of index-free adjacencies. Data is stored in form of triplets which are independent
artifacts. In order to traverse the graph one need to join multiple triplets.
8/25
RDF example
[G. Schreiber, Y. Raimond, RDF 1.1 Primer, W3C, 2014]
In RDF, resources are
described by IRI - International
Resource Identifier
RDF define logical
relationships. A number of
different serialization formats
exist for writing down RDF
graphs:
● Turtle
● JSON-LD
● RDFa
● RDF/XML
Popular RDF datasets:
● Wikidata
● Dbpedia
● WordNet
● Europeana
● VIAF
9/25
Hypergraphs
[I. Robinson, J. Webber, E. Eifrem, Graph Databases, O’Reilly Media, 2013]
HyperGraphDB
http://www.hypergraphdb.org
Using hypergraphs we lose the ability to add
properties to the individual relationships.
10/25
The Property Graph Model
● The most popular variant of graph model
● Only one-to-one relationships
● The Property Graph Model databases are typically schema-less. There is
no notion of database schema.
● Querying is often done in specification by example way, i.e. by finding
data (nodes and relationships) matching the specified pattern.
● Optimization for traversal
● Popular solutions:
– Neo4j (pure graph DBMS)
– OrientDB (hybrid document and graph DBMS)
11/25
Neo4j
● Written in Java but uses some high-performance features of JVM
● Concepts:
– Nodes (can have zero or more properties)
– Relationships (always have direction and a type; can have zero or more properties)
– Labels for grouping nodes together (a node can have zero or more labels; labels have colors assigned)
● Neo4j is a schema-optional graph database (since 2.0 version). There are two schema elements:
– Indexes - you can create index on a set of properties of nodes with a specific label (Apache Lucene)
– Constraints - constraint (currently only unique) on a property of nodes of a given label (index will be added automatically)
● Two versions/modes:
– Web server with pure RESTful API and rich web GUI
– Embedded Java library
● RESTful API was designed with discoverability in mind. Just start with a GET on the service root (e.g.
http://localhost:7474/db/data) and you will a list of hyperlinks to available resources.
12/25
Cypher Query Language basics
● Cypher is declarative query language based on pattern matching
● Basic SQL syntax structure:
SELECT columns FROM table WHERE conditions
● Basic Cypher syntax structure:
MATCH pattern WHERE conditions RETURN nodes
● Patterns are defined in ASCII art graphs, e.g.:
MATCH x-->y RETURN x
● It is possible to crate data with Cypher as well:
CREATE ({key:"value"})
13/25
Cypher basic examples
●
Create a simple node
create ({name:"Anna"})
● Retrieve all the nodes
match x return x
● Create a labeled node with some properties
create (x:Person {name:"Jan", from: "Poland"})
● Retrieve all the nodes labeled as Person having parameter from: “Poland”
match (y:Person) where y.from = "Poland" return y
● Create a relationship
match x where x.name="Anna"
match (y:Person)
create x-[:knows]->y
14/25
Traversal queries
● Find Jan's friends. Return him and his friends.
MATCH (x:Person)-[:knows]-(friends)
WHERE x.name = "Jan"
RETURN x, friends
● Find friends of Jan's friends who likes surfing
MATCH (x:Person)-[:knows]-()-[:knows]-(surfer)
WHERE x.name = "Jan"
AND surfer.hobby = "surfing"
RETURN DISTINCT surfer
15/25
Starting points
● Patterns often have starting points, i.e. nodes or relationships that are
explicitly given.
● It is possible to specify the starting point using WHERE clause (as in the
previous slide), but it can be inefficient (when there are no indices).
● More proper way of specifying the starting point (node or relationship) is by
using the START keyword.
● These starting points are obtained via index lookups or, more rarely,
accessed directly based on node or relationship IDs
– START n=node:index-name(key = "value")
– START n=node(id)
16/25
START clause example
Find the mutual friends of user named “Michael”
[I. Robinson, J. Webber, E. Eifrem, Graph Databases, O’Reilly Media, 2013]
START a=node:user(name='Michael')
MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]->(a)
RETURN b, c
17/25
D3.js based graph visualization of the example data set
18/25
Transaction management
● Neo4j provide full ACID support
● All relationships must have a valid start node and end node. In
effect this means that trying to delete a node that still has
relationships attached to, it will throw an exception upon commit.
● When updating or inserting massive amounts of data then periodic
commit query hint (USING PERIODIC COMMIT) can be helpful.
● Currently only one isolation level (READ_COMMITTED) is supported.
● In order to execute a query inside a transaction, POST the query to
http://localhost:7474/db/data/transaction/{id}
19/25
Native Graph Storage
There are separate stores for nodes, relationships and properties. In order to be able to compute a
record’s location at cost O(1), all stores are fixed-size record stores.
Nodes (9 bytes)
Relationships are stored in doubly linked lists, so firstPrevRelId, firstNextRelId, secondPrevRelId and
secondNextRelId are pointers for the next and previous relationship records for the start and end nodes
[I. Robinson, J. Webber, E. Eifrem, Graph Databases, O’Reilly Media, 2013]
20/25
Scalability
●
On a single server, Neo4j is capable of managing 34*109
nodes
●
Currently, only full DB replication for read-only purposes, is available
– Master-slave architecture to support fault-tolerancy
– Horizontally scaling for read-mostly purposes
● Open transactions are not shared among members of an HA cluster. Therefore, if you use this
endpoint in an HA cluster, you must ensure that all requests for a given transaction are sent to the
same Neo4j instance.
● As was stated, in the graph database data are already “joined”, so it is hard to partition (to shard) a
graph into multiple machine.
● Neo4j team is working on this, but it is not ready yet. It would be desired to keep nodes tightly
connected (or belonging to a common domain) together on the same machine and loosely
connected (or belonging to different domains) on separate machines.
● The problem is that the connection that is currently loose, can one day in the future, become tight,
and vice-versa.
21/25
Graph algorithms
● Both graph theory and graph algorithms are mature and well-understood fields of
computing science and both can can be used to mine sophisticated information
from graph databases.
● Neo4j supports both depth- and breadth-first search
– Search type can be specified using BranchSelector and BranchOrderingPolicy
● Graph Algorithms available in neo4j
– all paths (find all paths between two nodes)
– all simple paths (find paths with no repeated nodes)
– shortest paths (find paths with the fewest relationship)
● Can find all shortest paths (if there are more than one) or just the first one.
– Dijkstra (find paths with the lowest cost)
– A* (improved version of Dijkstra algorithm)
22/25
Example of finding the shortest path using REST API
Example request
POST http://localhost:7474/db/data/node/35/path
Accept: application/json; charset=UTF-8
Content-Type: application/json
{
"to" : "http://localhost:7474/db/data/node/30",
"max_depth" : 3,
"relationships" : {
"type" : "to",
"direction" : "out"
},
"algorithm" : "shortestPath"
}
Example response
200: OK
Content-Type: application/json; charset=UTF-8
{
"start" : "http://localhost:7474/db/data/node/35",
"nodes" : [ "http://localhost:7474/db/data/node/35",
"http://localhost:7474/db/data/node/31","http://localhost:7474/db/data/node/30" ],
"length" : 2,
"relationships" : [ "http://localhost:7474/db/data/relationship/26", "http://localhost:7474/db/data/relationship/32" ],
"end" : "http://localhost:7474/db/data/node/30"
}
23/25
Spring Data Neo4J
Spring Data is an umbrella project that makes it easy to use new data access technologies,
such as non-relational databases, map-reduce frameworks, and cloud based data services.
Spring Data Neo4j is an integration library for Neo4j and it was the first Spring Data project
@NodeEntity
public class Movie {
@GraphId Long id;
@Indexed(type = FULLTEXT, indexName = "search")
String title;
Person director;
@RelatedTo(type="ACTS_IN", direction = INCOMING)
Set<Person> actors;
@Query("start movie=node({self})
match movie-->genre<--similar
return similar")
Iterable<Movie> similarMovies;
}
24/25
Bibliography
● I. Robinson, J. Webber, E. Eifrem, Graph Databases, O’Reilly Media, 2013
● R. Angles, C. Gutierrez, Survey of graph database models, ACM Computing
Surveys (CSUR), 2008
● M. A. Rodriguez, P. Neubauer, The Graph Traversal Pattern, Graph Data
Management: Techniques and Applications, 2011
● Jonas Partner, Aleksa Vukotic, and Nicki Watt, Neo4j in Action, Manning,
2014
● Eric Redmond. Jim R. Wilson, Seven Databases in Seven Weeks, The
Pragmatic Bookshelf, 2012
● G. Schreiber, Y. Raimond, RDF 1.1 Primer, W3C, 2014
25/25
Thank you!

Contenu connexe

Tendances

Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesDataStax
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4jNeo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to GraphsNeo4j
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jDatabricks
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)Emil Eifrem
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewNeo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jTobias Lindaaker
 

Tendances (20)

Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to Graphs
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
 
Graphdatabases
GraphdatabasesGraphdatabases
Graphdatabases
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
Mongodb
MongodbMongodb
Mongodb
 

Similaire à Graph databases

Change RelationalDB to GraphDB with OrientDB
Change RelationalDB to GraphDB with OrientDBChange RelationalDB to GraphDB with OrientDB
Change RelationalDB to GraphDB with OrientDBApaichon Punopas
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
 
Graph basedrdf storeforapachecassandra
Graph basedrdf storeforapachecassandraGraph basedrdf storeforapachecassandra
Graph basedrdf storeforapachecassandraRavindra Ranwala
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinthsDaniel Camarda
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeNational Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
PostgreSQL - Object Relational Database
PostgreSQL - Object Relational DatabasePostgreSQL - Object Relational Database
PostgreSQL - Object Relational DatabaseMubashar Iqbal
 
Neo4j - Graph Database
Neo4j - Graph DatabaseNeo4j - Graph Database
Neo4j - Graph DatabaseMubashar Iqbal
 
NOSQL Databases for the .NET Developer
NOSQL Databases for the .NET DeveloperNOSQL Databases for the .NET Developer
NOSQL Databases for the .NET DeveloperJesus Rodriguez
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMohan Rathour
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFDimitris Kontokostas
 

Similaire à Graph databases (20)

Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
Change RelationalDB to GraphDB with OrientDB
Change RelationalDB to GraphDB with OrientDBChange RelationalDB to GraphDB with OrientDB
Change RelationalDB to GraphDB with OrientDB
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4j
 
Database
DatabaseDatabase
Database
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
Graph basedrdf storeforapachecassandra
Graph basedrdf storeforapachecassandraGraph basedrdf storeforapachecassandra
Graph basedrdf storeforapachecassandra
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinths
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Nosql
NosqlNosql
Nosql
 
Neo4jrb
Neo4jrbNeo4jrb
Neo4jrb
 
PostgreSQL - Object Relational Database
PostgreSQL - Object Relational DatabasePostgreSQL - Object Relational Database
PostgreSQL - Object Relational Database
 
Neo4j - Graph Database
Neo4j - Graph DatabaseNeo4j - Graph Database
Neo4j - Graph Database
 
NOSQL Databases for the .NET Developer
NOSQL Databases for the .NET DeveloperNOSQL Databases for the .NET Developer
NOSQL Databases for the .NET Developer
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Spark
SparkSpark
Spark
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 

Dernier

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 

Dernier (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 

Graph databases

  • 2. 2/25 Graph Theory Seven Bridges of Königsberg problem defined by Leonhard Euler in 1735 How to find a walk through the city that would cross each bridge once and only once? [© Google] Euler proved that it is impossible to solve this problem! G = (V, E) E {V × V}⊆
  • 3. 3/25 Storing Connected Data in a Relational Database ● Relationships do exist in the relational databases, but only as a means of joins and joining tables ● Logically, join crates a Cartesian product of tables ● Operations of relational databases are index-intensive. Retrieval based on an index is fast, but not with a constant time (most often O(log 2 n)) ● Traversal queries require hierarchical joins, which are costly. Deep traversal queries are infeasible. Execution time increases exponentially with a depth of a join. ● For a given SQL query, RDBMS creates an in-memory graph data structure. ● Often relational database are normalized in order to efficiently organize data in a database. ● Normalization increases number of joins needed to query the database. Denormalization can be a partial solution.
  • 4. 4/25 Database normalization ● Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy. – Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. ● Normal forms – The first normal form (each attribute contains only atomic values) – The second normal form (each non primary key attribute is dependent on the whole primary key) – The third normal form (each non primary key attribute is dependent on nothing but the primary key) ● A relational database table is often described as "normalized" if it is in the 3NF ● When a database is intended for OLAP rather than OLTP, it is topically denormalized. ● Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data ● Examples of denormalization techniques: – Materialised views – Star schemas – OLAP cubes
  • 5. 5/25 Graph Database Highlights ● Graph data stores provide index-free adjacency resulting in a much better performance, if compared to traditional RDBMS ● Designed predominantly for traversal performance and executing graph algorithms ● Graph database is more natural, direct representation of a domain than RDBMS (no need for junction tables) ● There is no need for joining tables because the data structure is already “joined” by the edges that are defined. ● In graph databases denormalization is not needed! ● The interesting thing about graph diagrams is that they tend to contain specific instances of nodes and relationships, rather than classes or archetypes. ● The main purpose of Graph Databases is analysis and visualization of graphical data.
  • 6. 6/25 Graph Database Models ● The Property Graph Model – Model is built of nodes and relationships – Nodes contain key-value properties. Sometimes relationships as well. – Relationships are named and directed, and always have a start and end node ● Hypergraphs – Generalization of a graph model. – A relationship can have any number of nodes at either end of a relationship (many-to- many relationships) ● Triple stores – A triple expresses a relationship between two resources. – The triple is a subject-predicate-object data structure, e.g. Fred likes ice cream
  • 7. 7/25 Triple stores ● The Resource Description Framework (RDF) is a framework for expressing information about resources. ● Resources can be anything, including documents, people, physical objects, and abstract concepts. ● RDF is intended for situations in which information on the Web needs to be processed by applications, rather than being only displayed to people. ● RDF is a building block of the Semantic Web movement. ● RDF is a set of W3C specifications – SPARQL - SPARQL Protocol and RDF Query Language ● Disadvantages – Lack of index-free adjacencies. Data is stored in form of triplets which are independent artifacts. In order to traverse the graph one need to join multiple triplets.
  • 8. 8/25 RDF example [G. Schreiber, Y. Raimond, RDF 1.1 Primer, W3C, 2014] In RDF, resources are described by IRI - International Resource Identifier RDF define logical relationships. A number of different serialization formats exist for writing down RDF graphs: ● Turtle ● JSON-LD ● RDFa ● RDF/XML Popular RDF datasets: ● Wikidata ● Dbpedia ● WordNet ● Europeana ● VIAF
  • 9. 9/25 Hypergraphs [I. Robinson, J. Webber, E. Eifrem, Graph Databases, O’Reilly Media, 2013] HyperGraphDB http://www.hypergraphdb.org Using hypergraphs we lose the ability to add properties to the individual relationships.
  • 10. 10/25 The Property Graph Model ● The most popular variant of graph model ● Only one-to-one relationships ● The Property Graph Model databases are typically schema-less. There is no notion of database schema. ● Querying is often done in specification by example way, i.e. by finding data (nodes and relationships) matching the specified pattern. ● Optimization for traversal ● Popular solutions: – Neo4j (pure graph DBMS) – OrientDB (hybrid document and graph DBMS)
  • 11. 11/25 Neo4j ● Written in Java but uses some high-performance features of JVM ● Concepts: – Nodes (can have zero or more properties) – Relationships (always have direction and a type; can have zero or more properties) – Labels for grouping nodes together (a node can have zero or more labels; labels have colors assigned) ● Neo4j is a schema-optional graph database (since 2.0 version). There are two schema elements: – Indexes - you can create index on a set of properties of nodes with a specific label (Apache Lucene) – Constraints - constraint (currently only unique) on a property of nodes of a given label (index will be added automatically) ● Two versions/modes: – Web server with pure RESTful API and rich web GUI – Embedded Java library ● RESTful API was designed with discoverability in mind. Just start with a GET on the service root (e.g. http://localhost:7474/db/data) and you will a list of hyperlinks to available resources.
  • 12. 12/25 Cypher Query Language basics ● Cypher is declarative query language based on pattern matching ● Basic SQL syntax structure: SELECT columns FROM table WHERE conditions ● Basic Cypher syntax structure: MATCH pattern WHERE conditions RETURN nodes ● Patterns are defined in ASCII art graphs, e.g.: MATCH x-->y RETURN x ● It is possible to crate data with Cypher as well: CREATE ({key:"value"})
  • 13. 13/25 Cypher basic examples ● Create a simple node create ({name:"Anna"}) ● Retrieve all the nodes match x return x ● Create a labeled node with some properties create (x:Person {name:"Jan", from: "Poland"}) ● Retrieve all the nodes labeled as Person having parameter from: “Poland” match (y:Person) where y.from = "Poland" return y ● Create a relationship match x where x.name="Anna" match (y:Person) create x-[:knows]->y
  • 14. 14/25 Traversal queries ● Find Jan's friends. Return him and his friends. MATCH (x:Person)-[:knows]-(friends) WHERE x.name = "Jan" RETURN x, friends ● Find friends of Jan's friends who likes surfing MATCH (x:Person)-[:knows]-()-[:knows]-(surfer) WHERE x.name = "Jan" AND surfer.hobby = "surfing" RETURN DISTINCT surfer
  • 15. 15/25 Starting points ● Patterns often have starting points, i.e. nodes or relationships that are explicitly given. ● It is possible to specify the starting point using WHERE clause (as in the previous slide), but it can be inefficient (when there are no indices). ● More proper way of specifying the starting point (node or relationship) is by using the START keyword. ● These starting points are obtained via index lookups or, more rarely, accessed directly based on node or relationship IDs – START n=node:index-name(key = "value") – START n=node(id)
  • 16. 16/25 START clause example Find the mutual friends of user named “Michael” [I. Robinson, J. Webber, E. Eifrem, Graph Databases, O’Reilly Media, 2013] START a=node:user(name='Michael') MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]->(a) RETURN b, c
  • 17. 17/25 D3.js based graph visualization of the example data set
  • 18. 18/25 Transaction management ● Neo4j provide full ACID support ● All relationships must have a valid start node and end node. In effect this means that trying to delete a node that still has relationships attached to, it will throw an exception upon commit. ● When updating or inserting massive amounts of data then periodic commit query hint (USING PERIODIC COMMIT) can be helpful. ● Currently only one isolation level (READ_COMMITTED) is supported. ● In order to execute a query inside a transaction, POST the query to http://localhost:7474/db/data/transaction/{id}
  • 19. 19/25 Native Graph Storage There are separate stores for nodes, relationships and properties. In order to be able to compute a record’s location at cost O(1), all stores are fixed-size record stores. Nodes (9 bytes) Relationships are stored in doubly linked lists, so firstPrevRelId, firstNextRelId, secondPrevRelId and secondNextRelId are pointers for the next and previous relationship records for the start and end nodes [I. Robinson, J. Webber, E. Eifrem, Graph Databases, O’Reilly Media, 2013]
  • 20. 20/25 Scalability ● On a single server, Neo4j is capable of managing 34*109 nodes ● Currently, only full DB replication for read-only purposes, is available – Master-slave architecture to support fault-tolerancy – Horizontally scaling for read-mostly purposes ● Open transactions are not shared among members of an HA cluster. Therefore, if you use this endpoint in an HA cluster, you must ensure that all requests for a given transaction are sent to the same Neo4j instance. ● As was stated, in the graph database data are already “joined”, so it is hard to partition (to shard) a graph into multiple machine. ● Neo4j team is working on this, but it is not ready yet. It would be desired to keep nodes tightly connected (or belonging to a common domain) together on the same machine and loosely connected (or belonging to different domains) on separate machines. ● The problem is that the connection that is currently loose, can one day in the future, become tight, and vice-versa.
  • 21. 21/25 Graph algorithms ● Both graph theory and graph algorithms are mature and well-understood fields of computing science and both can can be used to mine sophisticated information from graph databases. ● Neo4j supports both depth- and breadth-first search – Search type can be specified using BranchSelector and BranchOrderingPolicy ● Graph Algorithms available in neo4j – all paths (find all paths between two nodes) – all simple paths (find paths with no repeated nodes) – shortest paths (find paths with the fewest relationship) ● Can find all shortest paths (if there are more than one) or just the first one. – Dijkstra (find paths with the lowest cost) – A* (improved version of Dijkstra algorithm)
  • 22. 22/25 Example of finding the shortest path using REST API Example request POST http://localhost:7474/db/data/node/35/path Accept: application/json; charset=UTF-8 Content-Type: application/json { "to" : "http://localhost:7474/db/data/node/30", "max_depth" : 3, "relationships" : { "type" : "to", "direction" : "out" }, "algorithm" : "shortestPath" } Example response 200: OK Content-Type: application/json; charset=UTF-8 { "start" : "http://localhost:7474/db/data/node/35", "nodes" : [ "http://localhost:7474/db/data/node/35", "http://localhost:7474/db/data/node/31","http://localhost:7474/db/data/node/30" ], "length" : 2, "relationships" : [ "http://localhost:7474/db/data/relationship/26", "http://localhost:7474/db/data/relationship/32" ], "end" : "http://localhost:7474/db/data/node/30" }
  • 23. 23/25 Spring Data Neo4J Spring Data is an umbrella project that makes it easy to use new data access technologies, such as non-relational databases, map-reduce frameworks, and cloud based data services. Spring Data Neo4j is an integration library for Neo4j and it was the first Spring Data project @NodeEntity public class Movie { @GraphId Long id; @Indexed(type = FULLTEXT, indexName = "search") String title; Person director; @RelatedTo(type="ACTS_IN", direction = INCOMING) Set<Person> actors; @Query("start movie=node({self}) match movie-->genre<--similar return similar") Iterable<Movie> similarMovies; }
  • 24. 24/25 Bibliography ● I. Robinson, J. Webber, E. Eifrem, Graph Databases, O’Reilly Media, 2013 ● R. Angles, C. Gutierrez, Survey of graph database models, ACM Computing Surveys (CSUR), 2008 ● M. A. Rodriguez, P. Neubauer, The Graph Traversal Pattern, Graph Data Management: Techniques and Applications, 2011 ● Jonas Partner, Aleksa Vukotic, and Nicki Watt, Neo4j in Action, Manning, 2014 ● Eric Redmond. Jim R. Wilson, Seven Databases in Seven Weeks, The Pragmatic Bookshelf, 2012 ● G. Schreiber, Y. Raimond, RDF 1.1 Primer, W3C, 2014

Notes de l'éditeur

  1. G – graph V – vertice E – edge
  2. D3 - Data-Driven Documents