One of the main resistences of RDBMS users to pass to a NoSQL product are related to the complexity of the model: Ok, NoSQL products are super for BigData and BigScale but what about the model?
Automating Google Workspace (GWS) & more with Apps Script
Â
Switching from the Relational to the Graph model
1. Switching from the
Relational to the
Graph model
Luca Garulli â
Founder and CEO @NuvolaBase Ltd
Author of OrientDB Doc/Graph DB
Oct 6th 2012 in Barcelona
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 1
www.orientechnologies.com
2. One of the main resistences of
RDBMS users to pass to a NoSQL product
are related to the
complexity of the model:
Ok, NoSQL products are super for
BigData and BigScale
but...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 2
3. ...but what about the model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 3
4. What is the NoSQL answer
about managing complex domains?
Key-Value stores
Column-Based
Document database
Graph database
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 4
5. CAUTION!
This presentation will not use a
social like domain with
the classic paradigm of
friend-of-friendN
where the graph databases
are already widely used...
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 5
6. ...But rather we will explore how
to think «graphically» with one of the
most common domains in the
enterprise world:
The old-classic CRM* domain
* today in 99% of the cases a RDBMS is used
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 6
7. Every developer knows
the Relational Model,
but who knows the
Graph one?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 7
8. Back to school:
Graph Theory crash course
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 8
10. Property Graph Model*
Vertices are directed
NoSQL
Luca Matters
Likes
name: Luca
surname: Garulli since: 2012
conference
company: NuvolaBase
editions: [Cologne, Barcelona]
Vertices and Edges
can have properties
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 10
11. Property Graph Model
NoSQL
Luca Matters
conference
An Edge connects 2 vertices:
use multiple vertices to
represents 1-N and N-M
relationships
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 11
12. Property Graph Model
Pere
Luca
Likes Joins
FriendOf
NoSQL
Matters
Katja conference
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 12
13. Compliments, this is your diploma in
«Graph Theory»
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 13
14. Now go back
to our domain:
the CRM
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 14
15. Domain: minimal CRM
Customer Address
Registry system
Order system
Order Stock
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 15
16. Domain: minimal CRM
Customer Address
How does
Relational DBMS
Registry system
manage relationships?
Order system
Order Stock
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 16
17. Relational World: 1-1 Relationships
Customer Address
Id Name Address Id Location
10 Luca 34 34 Rome, London
11 Katja 44 44 Cologne
34 Sylvia 54 54 Rome
56 Mark 66 66 New Mexico
88 Steve 68 68 Palo Alto
JOIN Customer.Address -> Address.Id
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 17
18. Relational World: 1-N Relationships
Customer Address
Id Name Id Customer Location
10 Luca 24 10 Rome
11 Katja 33 10 London
34 Sylvia 44 34 Rome
56 Mark 66 11 Cologne
88 Steve 68 88 Palo Alto
Inverse JOIN Address.Customer -> Customer.Id
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 18
19. Relational World: N-M Relationships
Customer CustomerAddress Address
Id Name Id Address Id Location
10 Luca 10 24 24 Rome
11 Katja 10 33 33 London
34 Sylvia 34 24 44 Rome
56 Mark 66 Cologne
88 Steve 68 Palo Alto
Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 19
20. Relational World: N-M Relationships
Customer CustomerAddress Address
Id Name Id Address Id Location
10 Luca 10 24 24 Rome
11 Katja 10 33 33 London
34 Sylvia 34 24 44 Rome
56 Mark 66 Cologne
88 Steve 68 Palo Alto
Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 20
21. Whatâs wrong with the
Relational Model?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 21
22. The JOIN is the evil!
Customer CustomerAddress Address
Id Name Id Address Id Location
10 Luca 10 24 24 Rome
11 Katja 10 33 33 London
34 Sylvia 34 24 44 Rome
56 Mark 66 Cologne
88 Steve 68 Palo Alto
These are all JOINs executed
everytime you traverse a
relationship!
relationship
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 22
23. A JOIN means searching for a key in
another table
The first rule to improve performance
is indexing all the keys
Index speeds up searches but slows down
insert, updates and deletes
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 23
24. So in the best case a JOIN is a lookup
into in an index
This is done per single join!
If you traverse hundreds of relationships
youâre executing hundreds of JOINs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 24
25. Index Lookup
it is really that fast?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 25
26. Index Lookup: how does it works?
A-Z
A-L M-Z
Think to an
Address Book
where we have to find
the Lucaâs phone number
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 26
27. Index Lookup: how does it works?
A-Z
A-L M-Z
A-L M-Z
A-D E-L M-R S-Z
Index algorithms are all
similar and based on
balanced trees
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 27
28. Index Lookup: how does it works?
A-Z
A-L M-Z
A-L M-Z
A-D E-L M-R S-Z
A-D E-L
A-B C-D E-G H-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 28
29. Index Lookup: how does it works?
A-Z
A-L M-Z
A-L M-Z
A-D E-L M-R S-Z
A-D E-L
A-B C-D E-G H-L
E-G H-L
E-F G H-J K-L
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 29
30. Index Lookup: how does it works?
A-Z
A-L M-Z
A-L M-Z
Found!
A-D E-L M-R S-Z
Each lookup takes
A-D E-L
X steps, where X
A-B C-D E-G H-L
grows with the
E-G H-L index size!
E-F G H-J K-L
Luca
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 30
31. An index lookup is executed
for each JOIN
Querying more tables can easily
produce millions of JOINs/Lookups!
Here the rule: more entries
= more lookup steps = slower JOIN
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 31
32. Is there a better way to
manage relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 32
33. âA graph database is any
storage system
that provides
index-free adjacencyâ
- Marko Rodriguez
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 33
34. How does GraphDB manage
index-free relationships?
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 34
35. an Open Source (Apache 2)
document-graph NoSQL dbms
supports: transactions, extended-SQL,
Multi-Master replication, etc
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 35
36. OrientDB: traverse a relationship
The Record ID (RID)
is a Physical position
RID = #13:35 RID = #13:100
RID = #14:54
Lives
Luca Rome
out: [#13:35]
in: [#13:100]
out : [#14:54] Label : âLivesâ in: [#14:54]
label : âCustomerâ label = âAddressâ
name : âLucaâ name = âRomeâ
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 36
37. GraphDB handles relationships as a
physical LINK to the record
assigned when the edge is created
on the other side
RDBMS computes the
relationship every time you query a database
Is not that crazy?!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 37
38. This means jumping from a
O(log N) algorithm to a near O(1)
traversing cost is not more affected
by database size!
This is huge in the BigData age
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 38
39. OrientDB in the Blueprints micro-benchmark,
on common hw, with a hot cache,
traverses 29,6 Millions
of records in less than 5 seconds
about 6 Millions of nodes traversed per sec!
Do not this at home with a
RDBMS*!
*unless you live in the Googleâs server farm
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 39
40. Create the graph in SQL
$luca> cd bin
$luca> ./console.sh
OrientDB console v.1.2.0-SNAPSHOT (www.orientdb.org)
Type 'help' to display all the commands supported.
orientdb> create vertex V set name = âLucaâ, label = âCustomerâ
Created vertex #13:35 in 0.03 secs
orientdb> create vertex V set name = âRomeâ, label = âAddressâ
Created vertex #13:100 in 0.02 secs
orientdb> create edge E from #13:35 to #13:100 set label = âLivesâ
Created edge #14:54 in 0.02 secs
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 40
41. Create the graph in Java
OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graphâ);
ODocument luca = graph.createVertex();
luca.field(âname", âLuca");
luca.field(âlabel", âCustomer");
ODocument rome = graph.createVertex();
rome.field(âname", âRomeâ);
rome.field(âlabel", âAddressâ);
ODocument edge = graph.createEdge(luca, rome).field(âlabelâ, âLivesâ);
edge.save();
graph.close();
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 41
42. Query the graph in SQL
orientdb> select in[label=âLivesâ].out from V where label = âAddressâ
and name = âRomeâ
---+--------+--------------------+--------------------+--------------------+
#| REC ID |label |out |in |
---+--------+--------------------+--------------------+--------------------+
0| 13:35|Luca |[#14:54] | |
---+--------+--------------------+--------------------+--------------------+
1 item(s) found. Query executed in 0.007 sec(s).
orientdb> select * from V where label = âAddressâ AND
in[label=âLivesâ].size() > 0
---+--------+--------------------+--------------------+--------------------+
#| REC ID |label |out |in |
---+--------+--------------------+--------------------+--------------------+
0| 13:100| Rome | |[#14:54] |
---+--------+--------------------+--------------------+--------------------+
1 item(s) found. Query executed in 0.007 sec(s).
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 42
43. Query the graph in Java
OGraphDatabase graph = new OGraphDatabase("local:/tmp/db/graphâ);
// GET ALL THE THE CUSTOMER FROM ROME, ITALY
List<ODocument> result = graph.command( new OCommandSQL (
âselect in[label=âLivesâ].out from V where label = âAddressâ
and name = ?â)
).execute( âRomeâ);
for( ODocument v : result ) {
System.out.println(âResult: â + v.field(âlabelâ) );
}
---------------------------------------------------------------------------------------
Result: Luca
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 43
44. Query vs traversal
Once youâve a well connected database
in the form of a Super Graph you can
cross records instead of query them!
All you need is some root vertices
where to start to traverse
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 44
45. Query vs traversal
Special
Customers Stocks
Customers
Luca John Sylvia
White
This is a Soap
root vertex Order Order
2332 8834
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 45
46. Query the graph in SQL
Customers Supposing that the root node #30:0 links all the
#30:0 Customer vertices
Get all the customers:
orientdb> select out.in from #30:0
Get all the customers who bought at least one âWhite Soapâ product:
orientdb> select * from (
select out.in from #30:0
) where out.in.out[label=âBoughtâ].in.name = âWhite Soapâ
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 46
47. Demo time!
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 47
48. NuvolaBase.com
The first
Graph Database
on the Cloud
always available
few seconds to setup it
use it from app & mobile
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 48
49. «Graphs change the way of modelling data»
Luca Garulli
Author of CEO at
Document-Graph NoSQL
Open Source project Ltd, London UK
www.twitter.com/lgarulli
(c) Luca Garulli Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License Page 49