In a world where everyone is connected, and everyone's data is on the web, scaling your database is no more a choice: it is a necessity.
In this talk we'll see how to make relational and non-relational databases scale at our needs by understanding and applying old and new patterns, then we'll look at the most common use cases, and how to address them by choosing the right patterns and tools.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Scalable Databases - From Relational Databases To Polyglot Persistence
1. SCALABLE DATABASES
From Relational Databases
To Polyglot Persistence
sergio.bossa@gmail.com
Sergio Bossa http://twitter.com/sbtourist
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
2. About Me
● Software architect and engineer
● Gioco Digitale (online gambling and casinos)
● Open Source enthusiast
● Terracotta Messaging (http://forge.terracotta.org)
● Terrastore (http://code.google.com/p/terrastore)
● Actorom (http://code.google.com/p/actorom)
● (Micro-)Blogger
● http://twitter.com/sbtourist
● http://sbtourist.blogspot.com
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
3. Five fallacies of data-centric systems
Data model is static.
Data volume is predictable.
Data access load is predictable.
Database topology doesn't change.
Database never fails.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
4. Scalable databases in action
● Scaling your database as a way to solve fallacies above.
● Scale to handle heterogeneous data.
● Scale to handle more data.
● Scale to handle more load.
● Scale to handle topology changes due to:
● Unplanned growth.
● Unpredictable failures.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
6. Master-Slave replication
● Master - Slave replication.
● One (and only one) master
database.
● One or more slaves.
● All writes goes to the master.
● Replicated to slaves.
● Reads are balanced among master
and slaves.
● Major issues:
● Single point of failure.
● Single point of bottleneck.
● Static topology.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
7. Master-Master replication
● Master - Master replication.
● One or more masters.
● Writes and reads can go to any
master node.
● Writes are replicated among
masters.
● Major issues:
● Limited performance and scalability
(typically due to 2PC).
● Complexity.
● Static topology.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
8. Vertical partitioning
● Vertical partitioning.
● Put tables belonging to different
functional areas on different
database nodes.
● Scale your data and load by
function.
● Move joins to the application
level.
● Major issues:
● No more truly relational.
● What if a functional area grows too
much?
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
9. Horizontal partitioning
● Horizontal partitioning.
● Split tables by key and put
partitions (shards) on different
nodes.
● Scale your data and load by key.
● Move joins to the application
level.
● Needs some kind of routing.
● Major issues:
● No more truly relational.
● What if your partition grows too
much?
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
10. Caching
● Put a cache in front of your database.
● Distribute.
● Write-through for scaling reads.
● Write-behind for scaling reads and
writes.
● Saves you a lot of pain, but ...
● “Only” scales read/write load.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
11. Did we solve our fallacies?
● We tried, but ...
● Still bound to the relational model.
● Replication only covers a few use cases.
● Partitioning is hard.
● Caching is good, but not definitive.
● ...
● Can we do any better?
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
12. It's Not Only SQL
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
13. NOSQL Characteristics
● Main traits of characterization:
● Data Model.
● Data Processing.
● Consistency Model.
● Scale Out.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
14. Data Model (1)
● Column-family based.
● Structure:
● Key-identified rows with a sparse number of columns.
● Columns grouped in families.
● Multiple families for the same key.
● Highlights:
● Dynamically add and remove columns.
● Efficiently access columns in the same group (column
family).
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
15. Data Model (2)
● Document based.
● Structure:
● Key-identified documents.
● Schema-less (but optionally constrained).
– JSON, XML ...
● Highlights:
● Dynamically change inner documents structure.
● Efficiently access documents as a unit.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
16. Data Model (3)
● Graph based.
● Structure:
● Nodes to represent your data.
● Relations as meaningful links between nodes.
● Properties to enrich both.
● Highlights:
● Rich data model.
● Efficient, fast, traversal of nodes and relations.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
17. Data Model (4)
● Key-Value based.
● Structure:
● Key-identified opaque values.
● Highlights:
● Great flexibility.
● Fast reads/writes for single entries.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
18. Data Processing
● Several options:
● Map/Reduce.
● Predicates.
● Range Queries.
● ...
● One common principle:
● Move processing toward related data.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
19. Consistency Model (1)
● Strict Consistency.
● All nodes ...
● At every point in time ...
● See a consistent view of the stored data.
– Per-key consistency.
– Multi-key consistency.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
20. Consistency Model (2)
● Eventual Consistency.
● Only a subset of all nodes ...
● At a specific point in time ...
● See a consistent view of the stored data.
– Other nodes will serve stale data.
– Other nodes will eventually get updates later.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
21. Scale Out (1)
● Master-based.
● Membership managed and
broadcasted by masters.
● Data consistency guaranteed by
masters.
● No SPOF with active/passive
masters.
● No SPOB with active/active
masters or cluster-cluster
replication.
● Prone to partitioning failures.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
22. Scale Out (2)
● Peer-to-peer.
● Membership is maintained through
multicast or gossip-based protocols.
● Data consistency is maintained
through quorum protocols.
● Easier to scale.
● Harder to maintain consistency.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
23. NOSQL Use Cases
● Use cases evolve along the following kinds of data:
● Rich.
● Runtime.
● Hot Spot.
● Massive.
● Computational.
● Do not use the same product for all cases.
● Pick multiple products for different use cases.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
24. NOSQL Products - Cassandra
● Cassandra (http://incubator.apache.org/cassandra)
● Data Model:
● Column-family based.
● Data Processing:
● Range queries, Predicates.
● Consistency:
● Eventual consistency.
● Scalability:
● Peer-to-peer, gossip based.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
25. NOSQL Products - Mongo DB
● Mongo DB (http://www.mongodb.org)
● Data Model:
● Document based (JSON).
● Data Processing:
● Map/Reduce, SQL-like queries.
● Consistency:
● Per-document strict consistency.
● Scalability:
● Replication, partitioning (alpha).
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
26. NOSQL Products - Neo4j
● Neo4j (http://neo4j.org)
● Data Model:
● Graph based.
● Data Processing:
● Path traversal, Index-based search.
● Consistency:
● Strict consistency.
● Scalability:
● Replication.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
27. NOSQL Products - Riak
● Riak (http://riak.basho.com)
● Data Model:
● Document based (JSON).
● Data Processing:
● Map/Reduce.
● Consistency:
● Eventual consistency.
● Scalability:
● Peer-to-peer, gossip based.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
28. NOSQL Products - Terrastore
● Terrastore (http://code.google.com/p/terrastore)
● Data Model:
● Document based (JSON).
● Data Processing:
● Range queries, Predicates.
● Consistency:
● Per-document strict consistency.
● Scalability:
● Master-based.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
29. NOSQL Products - Voldemort
● Voldemort (http://project-voldemort.com)
● Data Model:
● Key-Value.
● Data Processing:
● None.
● Consistency:
● Eventual consistency.
● Scalability:
● Peer-to-peer, gossip based.
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
30. NOSQL Products and Use Cases
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010
31. Final words
● A New World.
● New paradigms.
● New use cases.
● New products.
● Don't dismiss the old stuff.
● Relational databases still have their place.
● Embrace change.
● May the NOSQL power be with you.
● Let the Polyglot Persistence era begin!
Sergio Bossa – sergio.bossa@gmail.com
Javaday IV – Roma – 30 gennaio 2010