Making Sense of Graph Database Technologies

Making Sense of Graph Database
Noel Yuhanna, Principal Analyst
Forrester Research
Teleconference

Today, we live in a
digital world that’s
generating billions
of data points every
millisecond..
Today, we live in a
digital world that’s
generating billions
of data points every
millisecond..

© 2014 Forrester Research, Inc. Reproduction Prohibited 4
of data is on the
public Net.

“Why is the amount of data stored by your firm increasing?”
(Please select the top three reasons.)
We are digitizing everything..

…but the bad news is that we are creating too
many data silos….that fail to deliver unified
and connected data Apps

Drivers and trends affecting databases
DBMS strategy
› Increased data volumes
› Strong data security controls
› Increased transaction volume
› Nonstop 24x7 availability
› All types of data storage
› New apps — social, mobile apps
› Cost control and stalled budget
› Faster real-time data access
› Integrated app/data
› Unpredictable workloads

New business requirements are making
older data management methods
inadequate
› Business challenges:
• Customer is the king – offer more personalization
• Deliver more innovative products
• Deliver more customer-driven products and services…
› Technology challenges:
• Increasing data volume, velocity, silos
• Need for continuous availability of information
• Increasing number of users, Apps, workloads, patterns

› Social networking apps
› Mobile applications
› High-performance apps
› Real-time apps
› Real-time data mashups
› Departmental and collaboration
› Predictive analytics
New applications are changing database
requirements . . .
Real-time data
Unstructured data
Faster access
Self-service
Automated
Many are building a dozen apps every week!!

Source: June 7, 2013, “The Steadily Growing Database Market Is Increasing Enterprises’ Choices” Forrester report
Database categorization based on
function

Source: February 13, 2014, “TechRadar™: Enterprise DBMS, Q1 2014” Forrester report
TechRadar: Database Management 2014

Connected data has become critical for
any business to succeed
CustomerCompany
Products
Friends
GeoLocation
Devices
Services
Support Billing
Tweets
FacebookYelp
Linkedin

› Imagine doing a 100 table join in Relational .
• How long will it take to run?
• How long will you SQL statement be?
• What kind of indexes would be needed?
• Will it try to create a Cartesian product?
• What kind of system resources are needed?
..but dealing with connected data is
complex and resource intensive…

Graph Databases Overcome these
issues… offer new possibilities!
› Graph databases simplify and speed up access to data containing
many relationships.
› Graph structures consist of nodes (things), edges (relationships),
and properties (key values) to store and access complex data
relationships which is challenging in other database types.
› Graph databases directly support relationships and can rapidly
access complex networks of connected data.

Graph databases supports many use
cases …
› Social network Apps – E.g.. Facebook, twitter, LinkedIn.
› Pattern analysis - E.g.. Detecting fraud, consumer behavior
› Analysis of massive data - communication/network management
› Recommendation engines
› Consumer personalization
› Mobile Apps
› Gaming
› Up-sell/cross-sell
› Real-time Apps
› Others..

Recommendations
› Graph Databases should be part of your DBMS strategy
› NoSQL has become more mature, with 25% adoption
› Graph Databases offer many use cases that go beyond traditional
Application and business requirements so think differently
› Train your developers, data architects and administrators on graph
databases
› Remember not all applications are good for Graph so pick ones that
are dealing with lots of connected data requirements
› Start small and grow. Build smaller graph Apps to understand its
business and technology value and then expand to larger ones.
› Graph Databases offer endless possibilities – Remember
enterprises that’ll leverage data more efficiently are more likely to
succeed and have a competitive edge.

Thank you
Noel Yuhanna
+1 650.581.3807
nyuhanna@forrester.com
www.forrester.com

Discovering Valuable Connections
in Big Data
Making Sense of Graph
Database Technologies
Brian Clark VP Product Management
Objectivity, Inc.© 2014, Confidential

Agenda
• An overview of NoSQL
• Why Graph?
• Graph databases
• Business value – what to look for?
• Technical value ‐ what to look for?
• Objectivity Performance Centre

NOSQL
An Overview of Four Primary NOSQL Technologies.

The “Not Only SQL” MarketConnectedData
Query and Navigational Complexity
Big Table
Clones
BigTable (Google),
Cassandra, Cloudera,
Hbase, Hypertable
Scalable, Distributed Graph Database
FlockDB (Twitter),
AllegroGraph, DEX,
InfoGrid, Neo4J, Titan
Graph & Object
Databases
Key-Value
Stores
Dynamo (Amazon),
Voldemort (LinkedIn),
Citrusleaf, Membase,
Risk, Tokyo, Cabinet
Document
Databases
CouchOne,
MongoDB,
OrientDB, Terrastore

© Copyright 2014 Objectivity, Inc. All Rights Reserved.
Strictly Confidential.
Big Data Tools
Massively
Parallel Data
Streams
Ingest
Hadoop
Process
Map/
Reduce
Store/
Database
Analysis Visualization
Palantir
NoSQL
Files
Objectivity/DB Custom
Analytics &
Visualization
Graph/
Object DB
Analytics &
Visualization Apps
RDBMS
InfiniteGraph

Ingest Process & Correlation
The New Big Data Workflow
© Copyright 2014 Objectivity, Inc. All Rights Reserved.
Strictly Confidential.
Analysis & Visualization

Why Graph?
According to a report by industry observer DB‐Engines, “Graph DBMSs are
gaining in popularity faster than any other database category,” growing 300
percent since January of last year.

Why Graph?
The real world is not a set of neatly lined rows and columns.
• It’s all about understanding relationships and connections
• Graph’s relationship based data model enables modeling of
real world, complex, interconnected use cases.
• Find hidden value to improve business decisions, efficiencies
and increase growth.
• High performance, complex query capabilities.

Object Databases
OID OBJECT
Connections
• Data Model:
– Every object instance belongs to a
class (type) and has a group of
values (properties).
– Every object instance has a unique
object identifier [OID].
– Connections implemented using
OIDs.
• Examples:
– Objectivity/DB and db4objects.
• Strengths:
– Simple, powerful data model that
includes inheritance and
polymorphism.
– Good scalability if sharding is
supported.
– Uses object identifiers instead of
“JOINs” to support very fast
navigational operations.
• Weaknesses:
– Supports standard object oriented
languages but isn't supported by a
wide range of third party tools in the
way that SQL is.

Graph Databases
VERTEX EDGE
2 N
• Data model:
– Node (Vertex) and Relationship
(Edge) objects.
– Directed.
• Examples:
– InfiniteGraph, Neo4j, OrientDB,
AllegroGraph, TitanDB.
• Strengths:
– Extremely fast for connected data.
– Scales out, typically.
– Easy to query (navigation).
– Simple data model.
• Weaknesses:
– Requires conceptual shift... a
different way of thinking.

Graph Computing
Graph Databases Graph Analytics
‐Transactions
‐Indices
‐Concurrency
‐Availability
‐Schema
‐‘User time’
‐Processing
‐Stateless
‐Batch
‐Supersteps
‐Algorithms
‐‘Business
time’
GraphLab
Faunus (Aurelius)
Apache Giraph / Pregel
(Google)
IG
Neo4j (Neo Techlogies)
Titan (Aurelius)
Dex (Sparsity)
‐Queries
‐Pathfinding
‐Graphviews
‐Pipelining
‐Formatters
‐Exporters

Graph DB Use Cases

Sample of Graph Database Options

BUSINESS VALUE
What to look for?

Business Values
• Enterprise Ready and Proven
– Optimized for Multi‐user/ multi‐application environments
– Distributed and scalable
– Real‐time access to data
– High performance search and discovery
• Lower Total Cost of Ownership
– How does the graph database maximize the use of
expensive scarce resources (cores, memory, disk and
network)?

Business Value – Enterprise Ready & Proven
• Is the graph database optimized for the enterprise:
– Concurrency ‐ are you able to run many threads, many
processes against the graph database?
– Can many different applications from many different
locations access the graph database?
• Does the graph database work in a distributed
environment:
– Are both distributed data and processing supported?
– Does it scale out (rather than scale up)?
• What levels of support are available?

Business Value – Enterprise Ready & Proven
• Data availability:
– Is the graph data immediately available?
• Or what is the latency?
– Are indexes immediately consistent?
• Some 3rd party indexes are not immediately consistent.
– Can you use 3rd party key/value stores during ingest?
• Can improve ingest performance.
• High Performance Search and Discovery:
– Does the graph database support schema‐less or schema‐full
approaches?
• Trade off between flexibility and performance.
• Trade off between flexibility and reliability (constraints implemented
by schema).

Business Value ‐ TCO
• Lower TCO (Total Cost of Ownership):
– Can the graph processing be distributed across multiple
computers?
– Does the whole graph have to fit in memory?
– How is the network utilized? Send the data to the
processing or send the processing to the data?
– How much space does the graph database occupy on disk?

TECHNICAL VALUE
What to look for?

Performance Measurement
• Measurement Criteria
• Performance:
– Measure throughput – ingest nodes &
edges per second; lookups per second;
traversals (paths, hops) per second.
• Parallelism (distributed):
– Scalability:
• Processing;
• Storage;
• Measurement ‐ how much, how many?
– Concurrency:
• Multi‐threaded; multi‐user; multi‐
computer;
• Measurement ‐ #concurrent users,
transactions.
• Usability
• Different resources can affect
performance:
• CPU:
– How many, # cores?
• Memory:
– How much?
• Storage:
– Local & remote? How many?
– Type – SSD or rotational?
• Network:
– Bandwidth & latency?
• Technology:
– price/performance?

Technical Value – A Test Case
• Clickstream data a good test for concurrency by splitting
up the files for parallel ingest of vertices and edges.
• Clickstream data loaded into InfiniteGraph 3 ways:
• single threaded – create vertices, create edges and make connections;
• multiple threads within single process for improved throughput
(beware of deadlocks) – create vertices, create edges and make
connections;
• multiple threads within single process, create vertices, create edges,
then use pipeline agents to complete the graph overcoming
deadlocks.
• Clickstream data used to perform explore and navigate
(shortest path) queries. Generated graph has good
“connectedness” but no real schema.

OBJECTIVITY PERFORMANCE
CENTRE

Goals & Objectives of the Performance Centre
• Improve Understanding of NoSQL Products and
Technologies.
• Internal and External Education and Training.
• Encourage Partner Collaboration.
• Discover Areas for Improvement.
• Develop a Customer Centric Suite of Tests for
Performance Comparisons.

1,000,000 2,000,000 4,000,000 8,000,000 16,000,000 32,000,000
IG33 513 671 692 652 753 561
Neo4j 4790 5512 6298 5639 7347 7324
Titan‐B 1866 3517 5834 5834 6283 6310
Titan‐C 763 1435 3797 2407 4177 3548
Titan‐H 1389 1389
0
1000
2000
3000
4000
5000
6000
7000
8000
createTriples ‐ memory usage ‐ MB
IG33
Neo4j
Titan‐B
Titan‐C
Titan‐H
Example of memory use

Q&A
Thank you for your time!
Please contact us for a complimentary solution
evaluation at info@objectivity.com
Visit our website www.objectivity.com for access to technical
resources, demos and free trial downloads of our products.

Making Sense of Graph Database Technologies

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Making Sense of Graph Database Technologies

Similaire à Making Sense of Graph Database Technologies (20)

Plus de InfiniteGraph

Plus de InfiniteGraph (20)

Dernier

Dernier (20)

Making Sense of Graph Database Technologies