OrientDB: Unlock the Value of Document Data Relationships

OrientDB: Unlock the Value of Document Data
Relationships
Fabrizio Fortino

@fabriziofortino
11th April 2016
#HUGIreland

@boistartups

The world is changing
Unstructured

Data
Big Data
Explosion
Connected

Data
Mobile, IOT
http://destinhaus.com/internet-of-things-the-rise-of-smart-manufacturing/

“… starting a new strategic enterprise
application you should no longer be assuming
that your persistence should be relational. The
relational option might be the right one - but
you should seriously look at other alternatives.”
Polyglot Persistence [2011]
Martin Fowler
Rethink how we store data

A Polyglot Persistence example
E-commerce Application
Primary Store

+

Financial Data

(RDBMS)
Recommendations

(Graph)
Products Catalog

(Document)
User Sessions

(Key-Value)
ETL Jobs / Data Synchronisation

• Hire experts for each database type

• No standards between NOSQL products

• Increased overall complexity

• High TCO

• Write and maintain ETL and data synchronisation

• Hard to refactor

• Testing can be tough
More ﬂexibility, at what price?

Entering Multi-Model Databases
GraphDocument
Object
Key/Value

Full-Text
Spatial
Multi-Model represents the
intersection

of multiple models in a single
product

Product Positioning Quadrant
RelationshipComplexity>
Data Complexity >
Relational
Key Value
Column
Graph
Document
Multi-Model

• First Multi-Model DBMS with a Graph Engine

• Community Edition FREE (Apache v2 License)

• Enterprise Edition (proﬁler, live monitor, telereporter, etc)

• Vibrant community (≈ 100 contributors, ≈ 15K commits)

• Easy to install and use

• Zero conﬁguration Multi-Master Architecture

• ACID

• Reactive (Live Queries)
OrientDB at a Glance

Quite a long journey
1998 2009 2010 2011 20152012 20142013
OrientDB: First ever
multi-model DBMS
released as Open
Source
R&D
2016
OrientDB Enterprise
Launch
0
12K
70K
3K
1K
200
Downloads / month
Orient ODBMS: First
ever ODBMS with
index-free adjacency

Under the hood
Storage
Memory

Works in Memory Only

(Ideal for Integration Testing)
PLocal

Write/Read to/from File System
Remote

Delegates all Operations to a Remote
Server
Document API

Handles Records as Documents
Graph API

TinkerPop Blueprints Implementation
Object API

POJO to Document mapping
User Application

• Embedded (in-process)

• Single, Standalone Node

• Multi-Master Replica

• Mixed
Deployment options
Application
Application
Application
Application
Application

Document API
• Lowest level API

• Document (record) is the storage’s unit

• An immutable id (ORID) is automatically set to each
document

• Documents can contain key-value pairs or nested/
embedded documents (no ORID)

• Transactions support (optimistic mode with MVCC)

• Classes are logical sets of documents

Schema-less, Schema-full or Hybrid?
Schema-less

relaxed model, the type of each
field is inferred for each
document
Schema-full

strict model, schema with
constraints on fields and
validation rules
Hybrid

mixed model, schema with
mandatory and optional fields
with constraints and
validation rules

• Can inherits from other classes, creating a tree
(similar to RDF Schema)

• A sub-class inherits all the schema ﬁelds from
the parents

• An abstract class is used as the foundation for
other classes (it cannot have records)

• Class hierarchies allow native polymorphic
queries

• 1 to 1 mapping with domain objects
Class concept is taken from OOP

Let’s create a Document
`
{
”@rid": “#12:216”,
”@class": ”user",
“name”: “Fabrizio”,
“meetups”: [
{
“name”: “HUG Ireland”,
“city”: “Dublin”,
“since”: “14-03-2014”
}
],
“details”: {
“@type”: “d”,
“@class”: “user_details”
“city”:”Dublin”,
“nationality”:”IT”
}
}
Immutable Record ID
Logical set
Property
Array of objects
Embedded document

Let’s create a Document
`
{
”@rid": “#12:216”,
”@class": ”user",
“meetups”: [
{
“since”: “14-03-2014”
}
],
“details”: {
“@type”: “d”,
“@class”: “user_details”
“city”:”Dublin”,
“nationality”:”IT”
}
}
Immutable Record ID
Logical set
Property
Array of objects
Embedded document
With a traditional Document DB you have to
duplicate your data to some degree. The degree
depends on how complex are the
interdependencies of the application domain.

OrientDB combines the unique ﬂexibility of
documents with the power of graphs to unlock the
business value of Document Data Relationships.

Graphs: everything old is new again
https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

What is a Graph Database?
“A Graph Database is any storage system
that provides index-free adjacency”
The Graph Traversal Pattern [2010]
Marco A. Rodriguez
G = (V, E)
Graph
Vertex
Edge
A

• Given a User (Fabrizio)

• Find Fabrizio (id=10) in member table O(log n)

• Find 18 and 24 (Hug Ireland & Microservices) in Meetup table O(log n)
What’s wrong with joins?
name id
Fabrizio 10
Uli 12
John 13
Eddie 88
User
user_id meetup_id
10 18
10 24
13 18
88 66
member
id name
18 HUG Ireland
57 AWS Users
24 Microservices
66 Scala
Meetup
• Joins are computed every time you cross relationships

• Time complexity grows with data: O(log n)

• Joining 3-4 tables with million of records could create billion combinations

• Given a User (Fabrizio)

• Traverse the edges member to reach Hug Ireland O(1) & Microservices O(1)

• Fabrizio is the index to reach the linked Meetups!
The Graph as an Index
• Every vertex and edge is “hard wired” to its adjacent vertex or edge

• Traversing an edge does not require complex computation, near O(1)

• The traversal time is not aﬀected by the database size
Fabrizio
HUG
Ireland
Micro
Services
member
member
Easier to sketch!

Combine Documents with Graphs
`
{
“@rid”: “12:216”,
“@class”: ”user",
“details”: {
“@type”: “d”,
“@class”: “user_detail”,
“nationality”: ”IT”
}
`
{
“@rid”: “13:12”,
“@class”: “meetup”,
“city”: “Dublin”
}
`
{
“@rid”: “14:32”,
“@class”: “member”,
“since”: “14-03-2014”,
“in”: “12:216”,
“out”: “13:12”
}
out_member=14:32 in_member=14:32
{
“@rid”: “15:79”,
“@class”: “talk”,
“title”: “OrientDB”,
“on”: “11-04-2016”,
“in”: “12:216”,
“out”: “13:12”
}
out_talk=15:79
in_talk=15:79

Combine Documents with Graphs
`
{
“@rid”: “12:216”,
“@class”: ”user",
“details”: {
“@type”: “d”,
“@class”: “user_detail”,
“nationality”: ”IT”
}
`
{
“@rid”: “13:12”,
“@class”: “meetup”,
“city”: “Dublin”
}
`
{
“@rid”: “14:32”,
“@class”: “member”,
“since”: “14-03-2014”,
“in”: “12:216”,
“out”: “13:12”
}
out_member=14:32 in_member=14:32
{
“@rid”: “15:79”,
“@class”: “talk”,
“title”: “OrientDB”,
“on”: “11-04-2016”,
“in”: “12:216”,
“out”: “13:12”
}
out_talk=15:79
in_talk=15:79
Multi-relational Document Graph

Will you believe me if I said you can query
documents/graphs with SQL like syntax?
Show me something now! OK, time for a quick demo.
http://www.sharegoodstuffs.com/2011_12_12_archive.html

Use Case: raise standards in Irish Public Oﬃce

• Aggressive deadline
• Large amount of data from different sources with
different formats
• Messy, dirty data
• Connects records from different sources
representing the same thing without a common
identiﬁer
• Multiple steps traverse of ﬁxed and inferred links
to identify disparate entities connected by a path
The challenges

The solution
OrientDB
Fuzzy Inference Engine

• Main Language: Groovy
• Database Type: OrientDB Embedded
• Fuzzy Inference Engine: Duke
• minHash proximity index based on Lucene to avoid cartesian
product
• probabilistic model with conﬁgurable statistical algorithms
(Levenshtein, NGram, Soundex, Custom, etc) to identify the
same entities despite differences
• End-To-End Process Time < 10 min
• Deliverable: Database
• Preset of queries to answer the main questions (analysts are
completely independent to add / modify where conditions)
• GraphView to visually search and visualise data
Technical Details

What people from home perceived
≈ 20K tweets
Top hashtag in Ireland for 24 hours#rteinvestigates

“While we’ve long understood the value of Big Data to better
understand how people interact with us, we’ve noticed an
alarming trend of Big Data envy: organizations using complex
tools to handle “not-really-that-big” Data. Distributed map-
reduce algorithms are a handy technique for large data sets,

but many data sets we see could easily ﬁt in a single node

relational or graph database. Even if you do have

more data than that, usually the best thing to do is

to ﬁrst pick out the data you need, which can often

then be processed on such a single node”
OK but what about Big Data?
ThoughtWorksTechnology Radar, 5 April 2016

Begin the journey!
https://www.udemy.com/orientdb-getting-started/

• http://martinfowler.com/bliki/PolyglotPersistence.html

• https://en.wikipedia.org/wiki/Multi-model_database

• http://orientdb.com/

• https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

• http://arxiv.org/pdf/1004.1001.pdf

• https://www.udemy.com/orientdb-getting-started/

• http://www.rte.ie/news/investigations-unit/2015/1207/751833-rte-
investigates/

• https://github.com/larsga/Duke

• https://www.thoughtworks.com/radar
Resources

Q A
Thank you!
&
Fabrizio Fortino

@fabriziofortino
11th April 2016
#HUGIreland

@boistartups

OrientDB: Unlock the Value of Document Data Relationships

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (13)

Similaire à OrientDB: Unlock the Value of Document Data Relationships

Similaire à OrientDB: Unlock the Value of Document Data Relationships (20)

Dernier

Dernier (20)

OrientDB: Unlock the Value of Document Data Relationships