Scalable Databases - From Relational Databases To Polyglot Persistence

SCALABLE DATABASES
From Relational Databases
To Polyglot Persistence

sergio.bossa@gmail.com
Sergio Bossa http://twitter.com/sbtourist

Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010

About Me
● Software architect and engineer
● Gioco Digitale (online gambling and casinos)
● Open Source enthusiast
● Terracotta Messaging (http://forge.terracotta.org)
● Terrastore (http://code.google.com/p/terrastore)
● Actorom (http://code.google.com/p/actorom)
● (Micro-)Blogger
● http://twitter.com/sbtourist
● http://sbtourist.blogspot.com


Five fallacies of data-centric systems

Data model is static.
Data volume is predictable.
Data access load is predictable.
Database topology doesn't change.
Database never fails.


Scalable databases in action
● Scaling your database as a way to solve fallacies above.
● Scale to handle heterogeneous data.
● Scale to handle more data.
● Scale to handle more load.
● Scale to handle topology changes due to:
● Unplanned growth.
● Unpredictable failures.


Scaling Relational Databases


Master-Slave replication
● Master - Slave replication.
● One (and only one) master
database.
● One or more slaves.
● All writes goes to the master.
● Replicated to slaves.
● Reads are balanced among master
and slaves.
● Major issues:
● Single point of failure.
● Single point of bottleneck.
● Static topology.


Master-Master replication
● Master - Master replication.
● One or more masters.
● Writes and reads can go to any
master node.
● Writes are replicated among
masters.
● Major issues:
● Limited performance and scalability
(typically due to 2PC).
● Complexity.
● Static topology.


Vertical partitioning
● Vertical partitioning.
● Put tables belonging to different
functional areas on different
database nodes.
● Scale your data and load by
function.
● Move joins to the application
level.
● Major issues:
● No more truly relational.
● What if a functional area grows too
much?


Horizontal partitioning
● Horizontal partitioning.
● Split tables by key and put
partitions (shards) on different
nodes.
● Scale your data and load by key.
● Move joins to the application
level.
● Needs some kind of routing.
● Major issues:
● No more truly relational.
● What if your partition grows too
much?


Caching
● Put a cache in front of your database.
● Distribute.
● Write-through for scaling reads.
● Write-behind for scaling reads and
writes.
● Saves you a lot of pain, but ...
● “Only” scales read/write load.


Did we solve our fallacies?
● We tried, but ...
● Still bound to the relational model.
● Replication only covers a few use cases.
● Partitioning is hard.
● Caching is good, but not definitive.
● ...
● Can we do any better?


It's Not Only SQL


NOSQL Characteristics
● Main traits of characterization:
● Data Model.
● Data Processing.
● Consistency Model.
● Scale Out.


Data Model (1)
● Column-family based.
● Structure:
● Key-identified rows with a sparse number of columns.
● Columns grouped in families.
● Multiple families for the same key.
● Highlights:
● Dynamically add and remove columns.
● Efficiently access columns in the same group (column
family).

Data Model (2)
● Document based.
● Structure:
● Key-identified documents.
● Schema-less (but optionally constrained).
– JSON, XML ...
● Highlights:
● Dynamically change inner documents structure.
● Efficiently access documents as a unit.


Data Model (3)
● Graph based.
● Structure:
● Nodes to represent your data.
● Relations as meaningful links between nodes.
● Properties to enrich both.
● Highlights:
● Rich data model.
● Efficient, fast, traversal of nodes and relations.


Data Model (4)
● Key-Value based.
● Structure:
● Key-identified opaque values.
● Highlights:
● Great flexibility.
● Fast reads/writes for single entries.


Data Processing
● Several options:
● Map/Reduce.
● Predicates.
● Range Queries.
● ...
● One common principle:
● Move processing toward related data.


Consistency Model (1)
● Strict Consistency.
● All nodes ...
● At every point in time ...
● See a consistent view of the stored data.
– Per-key consistency.
– Multi-key consistency.


Consistency Model (2)
● Eventual Consistency.
● Only a subset of all nodes ...
● At a specific point in time ...
● See a consistent view of the stored data.
– Other nodes will serve stale data.
– Other nodes will eventually get updates later.


Scale Out (1)
● Master-based.
● Membership managed and
broadcasted by masters.
● Data consistency guaranteed by
masters.
● No SPOF with active/passive
masters.
● No SPOB with active/active
masters or cluster-cluster
replication.
● Prone to partitioning failures.


Scale Out (2)
● Peer-to-peer.
● Membership is maintained through
multicast or gossip-based protocols.
● Data consistency is maintained
through quorum protocols.
● Easier to scale.
● Harder to maintain consistency.


NOSQL Use Cases
● Use cases evolve along the following kinds of data:
● Rich.
● Runtime.
● Hot Spot.
● Massive.
● Computational.
● Do not use the same product for all cases.
● Pick multiple products for different use cases.


NOSQL Products - Cassandra
● Cassandra (http://incubator.apache.org/cassandra)
● Data Model:
● Column-family based.
● Data Processing:
● Range queries, Predicates.
● Consistency:
● Eventual consistency.
● Scalability:
● Peer-to-peer, gossip based.

NOSQL Products - Mongo DB
● Mongo DB (http://www.mongodb.org)
● Data Model:
● Document based (JSON).
● Map/Reduce, SQL-like queries.
● Consistency:
● Per-document strict consistency.
● Scalability:
● Replication, partitioning (alpha).

NOSQL Products - Neo4j
● Neo4j (http://neo4j.org)
● Data Model:
● Graph based.
● Path traversal, Index-based search.
● Consistency:
● Strict consistency.
● Scalability:
● Replication.

NOSQL Products - Riak
● Riak (http://riak.basho.com)
● Data Model:
● Map/Reduce.
● Consistency:
● Scalability:

NOSQL Products - Terrastore
● Terrastore (http://code.google.com/p/terrastore)
● Data Model:
● Range queries, Predicates.
● Consistency:
● Per-document strict consistency.
● Scalability:
● Master-based.

NOSQL Products - Voldemort
● Voldemort (http://project-voldemort.com)
● Data Model:
● Key-Value.
● None.
● Consistency:
● Scalability:

NOSQL Products and Use Cases


Final words
● A New World.
● New paradigms.
● New use cases.
● New products.
● Don't dismiss the old stuff.
● Relational databases still have their place.
● Embrace change.
● May the NOSQL power be with you.
● Let the Polyglot Persistence era begin!

Scalable Databases - From Relational Databases To Polyglot Persistence

Recommended

Recommended

More Related Content

More from Sergio Bossa

More from Sergio Bossa (6)

Recently uploaded

Recently uploaded (20)

Scalable Databases - From Relational Databases To Polyglot Persistence