Polyglot Persistence
“Polyglot Persistence, like polyglot
programming, is all about choosing the right
persistence option for the task at hand”
http://www.nearinfinity.com/blogs/scott_leberknight/polyglot_persistence.html
http://martinfowler.com/bliki/PolyglotPersistence.html
It all started from ...
a set of papers released by Google & Amazon
• Google Filesystem (2003)
http://research.google.com/archive/gfs.html
• Google MapReduce (2004)
http://research.google.com/archive/mapreduce.html
• Google BigTable (2006)
http://research.google.com/archive/bigtable.html
• Amazon Dynamo (2007)
http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-
sosp2007.pdf
Apache HBase
• Java • persistence through
HDFS (Hadoop)
• designed to be able to
store massive amounts • Map/Reduce with
of data Hadoop
• speaks HTTP / REST, • designed for real time
Thrift, Avro workloads
• based on Google • https://hbase.apache.org/
BigTable
Apache Cassandra
• Java • really fast writes
• inspired by Google • excellent for a large
BigTable and Amazon number of high speed
Dynamo counters
• tunable trade-offs • Map/Reduce possible
with Hadoop
• query by column and
•
range of keys http://cassandra.apache.org/
MongoDB
• C++ • map/reduce with
javascript
• document database
(bson) with rich indexing • server side javascript
• master / slave replication • journaling
• built-in sharding • fast in-place updates
• auto failover with replica • http://www.mongodb.org/
sets
Apache CouchDB
• Erlang • exposes a stream of
realtime updates
• document database
(json) • needs compacting
• bi-directional replication • indexing via views (JS)
• advanced conflict • attachment handling
resolution
• https://couchdb.apache.org/
• MVCC - writes do not
block reads
Riak (Basho)
• Erlang, C, Javascript • tunable trade-offs (N, R,
W)
• key, value store
• mapreduce in JS or
• focus on fault tolerance Erlang
and cross datacenter
replication • full-text indexing with
riak search
• speaks HTTP/REST or
custom binary • http://wiki.basho.com/
Neo4j
• Java • web admin interface
• graph database • nodes & relationships
can have metadata
• speaks HTTP/REST
• indexing
• standalone or
embeddable in Java apps • http://neo4j.org/
• full ACID
Redis
• C/C++ • values can be expired
• disk-backed data • Pub/Sub for messaging
structure server
• ideal for rapidly changing
• master-slave replication data that fits in memory
• supports: strings, lists, • http://redis.io/
sets, hashes, sorted sets
• batch operations
elasticsearch
• Java • simple multi-tenancy
• based on Apache Lucene • real-time search
• distributed by design • scale to 100s of
machines
• cloud aware (Amazon)
• http://www.elasticsearch.org/
• understands JSON
objects
• no-schema required
Apache SolrCloud
• Java • automatic management
of multiple shards
• based on Apache Lucene
(share the same repo) • automatic fail-over
• adds distributed • durable writes
capabilites to Solr
• https://wiki.apache.org/
• based on ZooKeeper for solr/SolrCloud
coordination & config
Apache Hadoop
• Java, C/C++ • can scale to 1000s of
machines
• set of distributed
systems (hdfs, mr etc.) • designed to be highly
available at the
• framework for application level
distributed data
processing • https://
hadoop.apache.org/
• simple programming
model (map / reduce)