SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
OrientDB: Unlock the Value of Document Data
Relationships
Fabrizio Fortino

@fabriziofortino
11th April 2016
#HUGIreland

@boistartups
The world is changing
Unstructured

Data
Big Data
Explosion
Connected

Data
Mobile, IOT
http://destinhaus.com/internet-of-things-the-rise-of-smart-manufacturing/
“… starting a new strategic enterprise
application you should no longer be assuming
that your persistence should be relational. The
relational option might be the right one - but
you should seriously look at other alternatives.”
Polyglot Persistence [2011]
Martin Fowler
Rethink how we store data
A Polyglot Persistence example
E-commerce Application
Primary Store

+

Financial Data

(RDBMS)
Recommendations

(Graph)
Products Catalog

(Document)
User Sessions

(Key-Value)
ETL Jobs / Data Synchronisation
• Hire experts for each database type

• No standards between NOSQL products

• Increased overall complexity

• High TCO

• Write and maintain ETL and data synchronisation

• Hard to refactor

• Testing can be tough
More flexibility, at what price?
Entering Multi-Model Databases
GraphDocument
Object
Key/Value

Full-Text
Spatial
Multi-Model represents the
intersection

of multiple models in a single
product
Product Positioning Quadrant
RelationshipComplexity>
Data Complexity >
Relational
Key Value
Column
Graph
Document
Multi-Model
• First Multi-Model DBMS with a Graph Engine

• Community Edition FREE (Apache v2 License)

• Enterprise Edition (profiler, live monitor, telereporter, etc)

• Vibrant community (≈ 100 contributors, ≈ 15K commits)

• Easy to install and use

• Zero configuration Multi-Master Architecture

• ACID 

• Reactive (Live Queries)
OrientDB at a Glance
Quite a long journey
1998 2009 2010 2011 20152012 20142013
OrientDB: First ever
multi-model DBMS
released as Open
Source
R&D
2016
OrientDB Enterprise
Launch
0
12K
70K
3K
1K
200
Downloads / month
Orient ODBMS: First
ever ODBMS with
index-free adjacency
Under the hood
Storage
Memory

Works in Memory Only 

(Ideal for Integration Testing)
PLocal

Write/Read to/from File System
Remote

Delegates all Operations to a Remote
Server
Document API

Handles Records as Documents
Graph API

TinkerPop Blueprints Implementation
Object API

POJO to Document mapping
User Application
• Embedded (in-process)

• Single, Standalone Node

• Multi-Master Replica

• Mixed
Deployment options
Application
Application
Application
Application
Application
Document API
• Lowest level API

• Document (record) is the storage’s unit

• An immutable id (ORID) is automatically set to each
document

• Documents can contain key-value pairs or nested/
embedded documents (no ORID)

• Transactions support (optimistic mode with MVCC)

• Classes are logical sets of documents
Schema-less, Schema-full or Hybrid?
Schema-less

relaxed model, the type of each
field is inferred for each
document
Schema-full

strict model, schema with
constraints on fields and
validation rules
Hybrid

mixed model, schema with
mandatory and optional fields
with constraints and
validation rules
• Can inherits from other classes, creating a tree
(similar to RDF Schema)

• A sub-class inherits all the schema fields from
the parents

• An abstract class is used as the foundation for
other classes (it cannot have records)

• Class hierarchies allow native polymorphic
queries

• 1 to 1 mapping with domain objects
Class concept is taken from OOP
Let’s create a Document
`
{
”@rid": “#12:216”,
”@class": ”user",
“name”: “Fabrizio”,
“meetups”: [
{
“name”: “HUG Ireland”,
“city”: “Dublin”,
“since”: “14-03-2014”
}
],
“details”: {
“@type”: “d”,
“@class”: “user_details”
“city”:”Dublin”,
“nationality”:”IT”
}
}
Immutable Record ID
Logical set
Property
Array of objects
Embedded document
Let’s create a Document
`
{
”@rid": “#12:216”,
”@class": ”user",
“name”: “Fabrizio”,
“meetups”: [
{
“name”: “HUG Ireland”,
“city”: “Dublin”,
“since”: “14-03-2014”
}
],
“details”: {
“@type”: “d”,
“@class”: “user_details”
“city”:”Dublin”,
“nationality”:”IT”
}
}
Immutable Record ID
Logical set
Property
Array of objects
Embedded document
With a traditional Document DB you have to
duplicate your data to some degree. The degree
depends on how complex are the
interdependencies of the application domain.

OrientDB combines the unique flexibility of
documents with the power of graphs to unlock the
business value of Document Data Relationships.
Graphs: everything old is new again
https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg
What is a Graph Database?
“A Graph Database is any storage system
that provides index-free adjacency”
The Graph Traversal Pattern [2010]
Marco A. Rodriguez
G = (V, E)
Graph
Vertex
Edge
A
• Given a User (Fabrizio)

• Find Fabrizio (id=10) in member table O(log n)

• Find 18 and 24 (Hug Ireland & Microservices) in Meetup table O(log n)
What’s wrong with joins?
name id
Fabrizio 10
Uli 12
John 13
Eddie 88
User
user_id meetup_id
10 18
10 24
13 18
88 66
member
id name
18 HUG Ireland
57 AWS Users
24 Microservices
66 Scala
Meetup
• Joins are computed every time you cross relationships

• Time complexity grows with data: O(log n)

• Joining 3-4 tables with million of records could create billion combinations
• Given a User (Fabrizio)

• Traverse the edges member to reach Hug Ireland O(1) & Microservices O(1)

• Fabrizio is the index to reach the linked Meetups!
The Graph as an Index
• Every vertex and edge is “hard wired” to its adjacent vertex or edge

• Traversing an edge does not require complex computation, near O(1)

• The traversal time is not affected by the database size
Fabrizio
HUG
Ireland
Micro
Services
member
member
Easier to sketch!
Combine Documents with Graphs
`
{
“@rid”: “12:216”,
“@class”: ”user",
“name”: “Fabrizio”,
“details”: {
“@type”: “d”,
“@class”: “user_detail”,
“city”: “Dublin”,
“nationality”: ”IT”
}
`
{
“@rid”: “13:12”,
“@class”: “meetup”,
“name”: “HUG Ireland”,
“city”: “Dublin”
}
`
{
“@rid”: “14:32”,
“@class”: “member”,
“since”: “14-03-2014”,
“in”: “12:216”,
“out”: “13:12”
}
out_member=14:32 in_member=14:32
{
“@rid”: “15:79”,
“@class”: “talk”,
“title”: “OrientDB”,
“on”: “11-04-2016”,
“in”: “12:216”,
“out”: “13:12”
}
out_talk=15:79
in_talk=15:79
Combine Documents with Graphs
`
{
“@rid”: “12:216”,
“@class”: ”user",
“name”: “Fabrizio”,
“details”: {
“@type”: “d”,
“@class”: “user_detail”,
“city”: “Dublin”,
“nationality”: ”IT”
}
`
{
“@rid”: “13:12”,
“@class”: “meetup”,
“name”: “HUG Ireland”,
“city”: “Dublin”
}
`
{
“@rid”: “14:32”,
“@class”: “member”,
“since”: “14-03-2014”,
“in”: “12:216”,
“out”: “13:12”
}
out_member=14:32 in_member=14:32
{
“@rid”: “15:79”,
“@class”: “talk”,
“title”: “OrientDB”,
“on”: “11-04-2016”,
“in”: “12:216”,
“out”: “13:12”
}
out_talk=15:79
in_talk=15:79
Multi-relational Document Graph
Will you believe me if I said you can query
documents/graphs with SQL like syntax?
Show me something now! OK, time for a quick demo.
http://www.sharegoodstuffs.com/2011_12_12_archive.html
Use Case: raise standards in Irish Public Office
• Aggressive deadline
• Large amount of data from different sources with
different formats
• Messy, dirty data
• Connects records from different sources
representing the same thing without a common
identifier
• Multiple steps traverse of fixed and inferred links
to identify disparate entities connected by a path
The challenges
The solution
OrientDB
Fuzzy Inference Engine
• Main Language: Groovy
• Database Type: OrientDB Embedded
• Fuzzy Inference Engine: Duke
• minHash proximity index based on Lucene to avoid cartesian
product
• probabilistic model with configurable statistical algorithms
(Levenshtein, NGram, Soundex, Custom, etc) to identify the
same entities despite differences
• End-To-End Process Time < 10 min
• Deliverable: Database
• Preset of queries to answer the main questions (analysts are
completely independent to add / modify where conditions)
• GraphView to visually search and visualise data
Technical Details
What people from home perceived
≈ 20K tweets
Top hashtag in Ireland for 24 hours#rteinvestigates
“While we’ve long understood the value of Big Data to better
understand how people interact with us, we’ve noticed an
alarming trend of Big Data envy: organizations using complex
tools to handle “not-really-that-big” Data. Distributed map-
reduce algorithms are a handy technique for large data sets,

but many data sets we see could easily fit in a single node

relational or graph database. Even if you do have

more data than that, usually the best thing to do is

to first pick out the data you need, which can often

then be processed on such a single node”
OK but what about Big Data?
ThoughtWorksTechnology Radar, 5 April 2016
Begin the journey!
https://www.udemy.com/orientdb-getting-started/
• http://martinfowler.com/bliki/PolyglotPersistence.html

• https://en.wikipedia.org/wiki/Multi-model_database

• http://orientdb.com/

• https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg

• http://arxiv.org/pdf/1004.1001.pdf

• https://www.udemy.com/orientdb-getting-started/

• http://www.rte.ie/news/investigations-unit/2015/1207/751833-rte-
investigates/

• https://github.com/larsga/Duke

• https://www.thoughtworks.com/radar
Resources
Q A
Thank you!
&
Fabrizio Fortino

@fabriziofortino
11th April 2016
#HUGIreland

@boistartups

Contenu connexe

Tendances

A quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesA quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesNicholas Crouch
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Codemotion
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageNeo4j
 
An introduction to Nosql
An introduction to NosqlAn introduction to Nosql
An introduction to Nosqlgreprep
 
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers ProgramSession 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers ProgramFIWARE
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Basel
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow BaselHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Basel
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow BaselPatrick Baumgartner
 
Hadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for BioinformaticsHadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for Bioinformaticsosintegrators
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 
Tabular Data on the Web
Tabular Data on the WebTabular Data on the Web
Tabular Data on the WebGregg Kellogg
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph DatabaseTobias Lindaaker
 
Creating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraCreating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraMarkus Lanthaler
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL databaseTobias Lindaaker
 
Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 ArangoDB Database
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational DatabasesChris Baglieri
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Data persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdbData persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdbDimgba Kalu
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerIBM Cloud Data Services
 
CouchDB : More Couch
CouchDB : More CouchCouchDB : More Couch
CouchDB : More Couchdelagoya
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 

Tendances (20)

A quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesA quick review of Python and Graph Databases
A quick review of Python and Graph Databases
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
An introduction to Nosql
An introduction to NosqlAn introduction to Nosql
An introduction to Nosql
 
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers ProgramSession 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
Session 2 - NGSI-LD primer & Smart Data Models | Train the Trainers Program
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Basel
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow BaselHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Basel
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Basel
 
Hadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for BioinformaticsHadoop and Neo4j: A Winning Combination for Bioinformatics
Hadoop and Neo4j: A Winning Combination for Bioinformatics
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
Tabular Data on the Web
Tabular Data on the WebTabular Data on the Web
Tabular Data on the Web
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
Creating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraCreating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with Hydra
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL database
 
Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Data persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdbData persistence using pouchdb and couchdb
Data persistence using pouchdb and couchdb
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data Layer
 
CouchDB : More Couch
CouchDB : More CouchCouchDB : More Couch
CouchDB : More Couch
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 

En vedette

OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0Orient Technologies
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App developmentLuca Garulli
 
Austin Data Geeks - Why relationships are cool but join sucks
Austin Data Geeks - Why relationships are cool but join sucksAustin Data Geeks - Why relationships are cool but join sucks
Austin Data Geeks - Why relationships are cool but join sucksOrient Technologies
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - J On The Beach 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL  - J On The Beach 2016OrientDB - the 2nd generation of (Multi-Model) NoSQL  - J On The Beach 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - J On The Beach 2016Luigi Dell'Aquila
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016Luigi Dell'Aquila
 
FOXX - a Javascript application framework on top of ArangoDB
FOXX - a Javascript application framework on top of ArangoDBFOXX - a Javascript application framework on top of ArangoDB
FOXX - a Javascript application framework on top of ArangoDBArangoDB Database
 
Grafos - Uma abordagem divertida - Latinoware 2014
Grafos - Uma abordagem divertida - Latinoware 2014Grafos - Uma abordagem divertida - Latinoware 2014
Grafos - Uma abordagem divertida - Latinoware 2014Christiano Anderson
 
An agile approach to cloud infrastructure
An agile approach to cloud infrastructureAn agile approach to cloud infrastructure
An agile approach to cloud infrastructureRichard Seroter
 
The Open Source Messaging Landscape
The Open Source Messaging LandscapeThe Open Source Messaging Landscape
The Open Source Messaging LandscapeRichard Seroter
 
ArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB Database
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 

En vedette (13)

OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App development
 
Austin Data Geeks - Why relationships are cool but join sucks
Austin Data Geeks - Why relationships are cool but join sucksAustin Data Geeks - Why relationships are cool but join sucks
Austin Data Geeks - Why relationships are cool but join sucks
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - J On The Beach 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL  - J On The Beach 2016OrientDB - the 2nd generation of (Multi-Model) NoSQL  - J On The Beach 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - J On The Beach 2016
 
Intoduction to OrientDB
Intoduction to OrientDBIntoduction to OrientDB
Intoduction to OrientDB
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
 
FOXX - a Javascript application framework on top of ArangoDB
FOXX - a Javascript application framework on top of ArangoDBFOXX - a Javascript application framework on top of ArangoDB
FOXX - a Javascript application framework on top of ArangoDB
 
Grafos - Uma abordagem divertida - Latinoware 2014
Grafos - Uma abordagem divertida - Latinoware 2014Grafos - Uma abordagem divertida - Latinoware 2014
Grafos - Uma abordagem divertida - Latinoware 2014
 
Introdução ao neo4j
Introdução ao neo4jIntrodução ao neo4j
Introdução ao neo4j
 
An agile approach to cloud infrastructure
An agile approach to cloud infrastructureAn agile approach to cloud infrastructure
An agile approach to cloud infrastructure
 
The Open Source Messaging Landscape
The Open Source Messaging LandscapeThe Open Source Messaging Landscape
The Open Source Messaging Landscape
 
ArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQL
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 

Similaire à OrientDB: Unlock the Value of Document Data Relationships

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflowsSSSW
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationPRELIDA Project
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...semanticsconference
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
DubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataDubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataScott Sosna
 

Similaire à OrientDB: Unlock the Value of Document Data Relationships (20)

Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
 
ExecutiveWhitePaper
ExecutiveWhitePaperExecutiveWhitePaper
ExecutiveWhitePaper
 
Ontologies & linked open data
Ontologies & linked open dataOntologies & linked open data
Ontologies & linked open data
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
NoSQL
NoSQLNoSQL
NoSQL
 
The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
 
Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphs
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
DubJug: Neo4J and Open Data
DubJug: Neo4J and Open DataDubJug: Neo4J and Open Data
DubJug: Neo4J and Open Data
 
Nosql
NosqlNosql
Nosql
 

Dernier

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 

Dernier (20)

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 

OrientDB: Unlock the Value of Document Data Relationships

  • 1. OrientDB: Unlock the Value of Document Data Relationships Fabrizio Fortino @fabriziofortino 11th April 2016 #HUGIreland @boistartups
  • 2. The world is changing Unstructured Data Big Data Explosion Connected Data Mobile, IOT http://destinhaus.com/internet-of-things-the-rise-of-smart-manufacturing/
  • 3. “… starting a new strategic enterprise application you should no longer be assuming that your persistence should be relational. The relational option might be the right one - but you should seriously look at other alternatives.” Polyglot Persistence [2011] Martin Fowler Rethink how we store data
  • 4. A Polyglot Persistence example E-commerce Application Primary Store + Financial Data (RDBMS) Recommendations (Graph) Products Catalog (Document) User Sessions (Key-Value) ETL Jobs / Data Synchronisation
  • 5. • Hire experts for each database type • No standards between NOSQL products • Increased overall complexity • High TCO • Write and maintain ETL and data synchronisation • Hard to refactor • Testing can be tough More flexibility, at what price?
  • 6. Entering Multi-Model Databases GraphDocument Object Key/Value Full-Text Spatial Multi-Model represents the intersection of multiple models in a single product
  • 7. Product Positioning Quadrant RelationshipComplexity> Data Complexity > Relational Key Value Column Graph Document Multi-Model
  • 8. • First Multi-Model DBMS with a Graph Engine • Community Edition FREE (Apache v2 License) • Enterprise Edition (profiler, live monitor, telereporter, etc) • Vibrant community (≈ 100 contributors, ≈ 15K commits) • Easy to install and use • Zero configuration Multi-Master Architecture • ACID • Reactive (Live Queries) OrientDB at a Glance
  • 9. Quite a long journey 1998 2009 2010 2011 20152012 20142013 OrientDB: First ever multi-model DBMS released as Open Source R&D 2016 OrientDB Enterprise Launch 0 12K 70K 3K 1K 200 Downloads / month Orient ODBMS: First ever ODBMS with index-free adjacency
  • 10. Under the hood Storage Memory Works in Memory Only (Ideal for Integration Testing) PLocal Write/Read to/from File System Remote Delegates all Operations to a Remote Server Document API Handles Records as Documents Graph API TinkerPop Blueprints Implementation Object API POJO to Document mapping User Application
  • 11. • Embedded (in-process) • Single, Standalone Node • Multi-Master Replica • Mixed Deployment options Application Application Application Application Application
  • 12. Document API • Lowest level API • Document (record) is the storage’s unit • An immutable id (ORID) is automatically set to each document • Documents can contain key-value pairs or nested/ embedded documents (no ORID) • Transactions support (optimistic mode with MVCC) • Classes are logical sets of documents
  • 13. Schema-less, Schema-full or Hybrid? Schema-less relaxed model, the type of each field is inferred for each document Schema-full strict model, schema with constraints on fields and validation rules Hybrid mixed model, schema with mandatory and optional fields with constraints and validation rules
  • 14. • Can inherits from other classes, creating a tree (similar to RDF Schema) • A sub-class inherits all the schema fields from the parents • An abstract class is used as the foundation for other classes (it cannot have records) • Class hierarchies allow native polymorphic queries • 1 to 1 mapping with domain objects Class concept is taken from OOP
  • 15. Let’s create a Document ` { ”@rid": “#12:216”, ”@class": ”user", “name”: “Fabrizio”, “meetups”: [ { “name”: “HUG Ireland”, “city”: “Dublin”, “since”: “14-03-2014” } ], “details”: { “@type”: “d”, “@class”: “user_details” “city”:”Dublin”, “nationality”:”IT” } } Immutable Record ID Logical set Property Array of objects Embedded document
  • 16. Let’s create a Document ` { ”@rid": “#12:216”, ”@class": ”user", “name”: “Fabrizio”, “meetups”: [ { “name”: “HUG Ireland”, “city”: “Dublin”, “since”: “14-03-2014” } ], “details”: { “@type”: “d”, “@class”: “user_details” “city”:”Dublin”, “nationality”:”IT” } } Immutable Record ID Logical set Property Array of objects Embedded document With a traditional Document DB you have to duplicate your data to some degree. The degree depends on how complex are the interdependencies of the application domain. OrientDB combines the unique flexibility of documents with the power of graphs to unlock the business value of Document Data Relationships.
  • 17. Graphs: everything old is new again https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg
  • 18. What is a Graph Database? “A Graph Database is any storage system that provides index-free adjacency” The Graph Traversal Pattern [2010] Marco A. Rodriguez G = (V, E) Graph Vertex Edge A
  • 19. • Given a User (Fabrizio) • Find Fabrizio (id=10) in member table O(log n) • Find 18 and 24 (Hug Ireland & Microservices) in Meetup table O(log n) What’s wrong with joins? name id Fabrizio 10 Uli 12 John 13 Eddie 88 User user_id meetup_id 10 18 10 24 13 18 88 66 member id name 18 HUG Ireland 57 AWS Users 24 Microservices 66 Scala Meetup • Joins are computed every time you cross relationships • Time complexity grows with data: O(log n) • Joining 3-4 tables with million of records could create billion combinations
  • 20. • Given a User (Fabrizio) • Traverse the edges member to reach Hug Ireland O(1) & Microservices O(1) • Fabrizio is the index to reach the linked Meetups! The Graph as an Index • Every vertex and edge is “hard wired” to its adjacent vertex or edge • Traversing an edge does not require complex computation, near O(1) • The traversal time is not affected by the database size Fabrizio HUG Ireland Micro Services member member Easier to sketch!
  • 21. Combine Documents with Graphs ` { “@rid”: “12:216”, “@class”: ”user", “name”: “Fabrizio”, “details”: { “@type”: “d”, “@class”: “user_detail”, “city”: “Dublin”, “nationality”: ”IT” } ` { “@rid”: “13:12”, “@class”: “meetup”, “name”: “HUG Ireland”, “city”: “Dublin” } ` { “@rid”: “14:32”, “@class”: “member”, “since”: “14-03-2014”, “in”: “12:216”, “out”: “13:12” } out_member=14:32 in_member=14:32 { “@rid”: “15:79”, “@class”: “talk”, “title”: “OrientDB”, “on”: “11-04-2016”, “in”: “12:216”, “out”: “13:12” } out_talk=15:79 in_talk=15:79
  • 22. Combine Documents with Graphs ` { “@rid”: “12:216”, “@class”: ”user", “name”: “Fabrizio”, “details”: { “@type”: “d”, “@class”: “user_detail”, “city”: “Dublin”, “nationality”: ”IT” } ` { “@rid”: “13:12”, “@class”: “meetup”, “name”: “HUG Ireland”, “city”: “Dublin” } ` { “@rid”: “14:32”, “@class”: “member”, “since”: “14-03-2014”, “in”: “12:216”, “out”: “13:12” } out_member=14:32 in_member=14:32 { “@rid”: “15:79”, “@class”: “talk”, “title”: “OrientDB”, “on”: “11-04-2016”, “in”: “12:216”, “out”: “13:12” } out_talk=15:79 in_talk=15:79 Multi-relational Document Graph
  • 23. Will you believe me if I said you can query documents/graphs with SQL like syntax? Show me something now! OK, time for a quick demo. http://www.sharegoodstuffs.com/2011_12_12_archive.html
  • 24. Use Case: raise standards in Irish Public Office
  • 25. • Aggressive deadline • Large amount of data from different sources with different formats • Messy, dirty data • Connects records from different sources representing the same thing without a common identifier • Multiple steps traverse of fixed and inferred links to identify disparate entities connected by a path The challenges
  • 27. • Main Language: Groovy • Database Type: OrientDB Embedded • Fuzzy Inference Engine: Duke • minHash proximity index based on Lucene to avoid cartesian product • probabilistic model with configurable statistical algorithms (Levenshtein, NGram, Soundex, Custom, etc) to identify the same entities despite differences • End-To-End Process Time < 10 min • Deliverable: Database • Preset of queries to answer the main questions (analysts are completely independent to add / modify where conditions) • GraphView to visually search and visualise data Technical Details
  • 28. What people from home perceived ≈ 20K tweets Top hashtag in Ireland for 24 hours#rteinvestigates
  • 29. “While we’ve long understood the value of Big Data to better understand how people interact with us, we’ve noticed an alarming trend of Big Data envy: organizations using complex tools to handle “not-really-that-big” Data. Distributed map- reduce algorithms are a handy technique for large data sets, but many data sets we see could easily fit in a single node relational or graph database. Even if you do have more data than that, usually the best thing to do is to first pick out the data you need, which can often then be processed on such a single node” OK but what about Big Data? ThoughtWorksTechnology Radar, 5 April 2016
  • 31. • http://martinfowler.com/bliki/PolyglotPersistence.html • https://en.wikipedia.org/wiki/Multi-model_database • http://orientdb.com/ • https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg • http://arxiv.org/pdf/1004.1001.pdf • https://www.udemy.com/orientdb-getting-started/ • http://www.rte.ie/news/investigations-unit/2015/1207/751833-rte- investigates/ • https://github.com/larsga/Duke • https://www.thoughtworks.com/radar Resources
  • 32. Q A Thank you! & Fabrizio Fortino @fabriziofortino 11th April 2016 #HUGIreland @boistartups