The document discusses graph databases and their advantages over traditional databases for modeling connected data. It provides an overview of graph databases and what they are used for. Key points include:
- Graph databases simplify and speed up access to connected data by using nodes, edges, and properties to represent relationships. This is challenging for other database types.
- Graph databases are gaining popularity faster than any other database category due to their ability to rapidly access complex networks of connected data.
- Graph databases support use cases involving social networks, recommendations, fraud detection, and more where relationships are important.
- When evaluating graph databases, considerations include performance, scalability, support for real-time access, and lowering the total cost of
1. Making Sense of Graph Database
Noel Yuhanna, Principal Analyst
Forrester Research
Teleconference
2. Today, we live in a
digital world that’s
generating billions
of data points every
millisecond..
Today, we live in a
digital world that’s
generating billions
of data points every
millisecond..
28. Object Databases
OID OBJECT
Connections
• Data Model:
– Every object instance belongs to a
class (type) and has a group of
values (properties).
– Every object instance has a unique
object identifier [OID].
– Connections implemented using
OIDs.
• Examples:
– Objectivity/DB and db4objects.
• Strengths:
– Simple, powerful data model that
includes inheritance and
polymorphism.
– Good scalability if sharding is
supported.
– Uses object identifiers instead of
“JOINs” to support very fast
navigational operations.
• Weaknesses:
– Supports standard object oriented
languages but isn't supported by a
wide range of third party tools in the
way that SQL is.
29. Graph Databases
VERTEX EDGE
2 N
• Data model:
– Node (Vertex) and Relationship
(Edge) objects.
– Directed.
• Examples:
– InfiniteGraph, Neo4j, OrientDB,
AllegroGraph, TitanDB.
• Strengths:
– Extremely fast for connected data.
– Scales out, typically.
– Easy to query (navigation).
– Simple data model.
• Weaknesses:
– Requires conceptual shift... a
different way of thinking.
35. Business Value – Enterprise Ready & Proven
• Is the graph database optimized for the enterprise:
– Concurrency ‐ are you able to run many threads, many
processes against the graph database?
– Can many different applications from many different
locations access the graph database?
• Does the graph database work in a distributed
environment:
– Are both distributed data and processing supported?
– Does it scale out (rather than scale up)?
• What levels of support are available?
36. Business Value – Enterprise Ready & Proven
• Data availability:
– Is the graph data immediately available?
• Or what is the latency?
– Are indexes immediately consistent?
• Some 3rd party indexes are not immediately consistent.
– Can you use 3rd party key/value stores during ingest?
• Can improve ingest performance.
• High Performance Search and Discovery:
– Does the graph database support schema‐less or schema‐full
approaches?
• Trade off between flexibility and performance.
• Trade off between flexibility and reliability (constraints implemented
by schema).
37. Business Value ‐ TCO
• Lower TCO (Total Cost of Ownership):
– Can the graph processing be distributed across multiple
computers?
– Does the whole graph have to fit in memory?
– How is the network utilized? Send the data to the
processing or send the processing to the data?
– How much space does the graph database occupy on disk?
39. Performance Measurement
• Measurement Criteria
• Performance:
– Measure throughput – ingest nodes &
edges per second; lookups per second;
traversals (paths, hops) per second.
• Parallelism (distributed):
– Scalability:
• Processing;
• Storage;
• Measurement ‐ how much, how many?
– Concurrency:
• Multi‐threaded; multi‐user; multi‐
computer;
• Measurement ‐ #concurrent users,
transactions.
• Usability
• Different resources can affect
performance:
• CPU:
– How many, # cores?
• Memory:
– How much?
• Storage:
– Local & remote? How many?
– Type – SSD or rotational?
• Network:
– Bandwidth & latency?
• Technology:
– price/performance?
40. Technical Value – A Test Case
• Clickstream data a good test for concurrency by splitting
up the files for parallel ingest of vertices and edges.
• Clickstream data loaded into InfiniteGraph 3 ways:
• single threaded – create vertices, create edges and make connections;
• multiple threads within single process for improved throughput
(beware of deadlocks) – create vertices, create edges and make
connections;
• multiple threads within single process, create vertices, create edges,
then use pipeline agents to complete the graph overcoming
deadlocks.
• Clickstream data used to perform explore and navigate
(shortest path) queries. Generated graph has good
“connectedness” but no real schema.