2. Hivereader.com
Our common stack is built on:
Super awesome and fast (once you learn what knobs to turn).
Lots of cool features and tools: pg_tune, pg_top, pg_bouncer
Django is a high-level Python Web framework that encourages
rapid development and clean, pragmatic design.
In-memory key-value store for small chunks of data.
Super simple and awesome
17. Hivereader.com
But wait, can Postgres handle this?
most likely
partitioning
and
sharding
Can have lots of app code sometimes.
What about when you outgrow your shard key
“Don’t shard until you have to”
- every single talk I’ve seen
Slony?
“master to multiple slaves” replication
19. Hivereader.com
Nearly all of them are based on 2 papers
Built on CAP theorem
The theorem began as a conjecture made by University of California, Berkeley computer
scientist Eric Brewer at the 2000 Symposium on Principles of Distributed Computing.
In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer's
conjecture, rendering it a theorem.
21. Hivereader.com
Super fast
Advanced key-value store
Think of it as super memcached. With union math.
All data must fit in ram
It is often referred to as a data structure server
Keys can contain strings, hashes, lists, sets and sorted sets
Not a db solution but more of a helper.
This is now a part of our basic stack for most apps.
22. Hivereader.com
Document store
JSON-like documents with dynamic schemas
Ad-hoc queries
Indexing
Load-balancing MongoDB scales horizontally using sharding
MongoDB uses a readers-writer lock that allows concurrent reads access to a database but gives exclusive access to a single write
operation.However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share
the lock.
Global write lock*
Uncompressed field names
Safe off by default
Just google “mongo problems” or “moving off mongodb”
23. Hivereader.com
Uses pre-defined column family format
Map Reduce
Used by all people with with ‘big data’ problems
Amazing workhorse for data
You need a sizable cluster
Cluster setup can be difficult
/ hbase
I need to personally spend more time with this
24. Hivereader.com
JSON to store data,
JavaScript for MapReduce and HTTP for an API.
Views: embedded map/reduce
Multi-master replication
BigCouch, couchbase, Membase?
Kind of in a dev rut...
...but just pushed a new huge upgrade
25. Hivereader.com
Based hardcore on Amazon's Dynamo paper
Key Value store
Super good about failure, “no downtime”
Map Reduce / Secondary Indexes
Built-in full text search
Link walking
2 types of mapreduce
Javascript - can be slow as hell
Erlang - super fast
26. Hivereader.com
Key value + row-oriented = column family
Linear scalability and fault-tolerance on commodity
hardware or cloud infrastructure
Built by Facebook for Messages
Has CQL3 - think SQL, kind of
Baked auto cluster AMI
Super fast writes
Compresses data that’s not accessed a lot
Can tie in to Hadoop for big map reduce
28. Hivereader.com
Things to think about:
Is eventual consistency ok for you?
Do you know your queries you need right now?
Is your data complicated or simple?
How fast does it grow?
How long do you want that data to hang around?
Really think about trade offs.
Every system has its good and bad
There is no “winner”, so stop searching “which is best”
Think about which fits your use case
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
http://docs.basho.com/riak/1.2.1/references/appendices/comparisons/
Tons of links out there, just make sure they are relatively new