13. • Frequent node failures
• Heavy disk fragmentation caused by deletes
• Slow reads from disk. Started storing in RAM.
• Primary -> Secondary caused downtime for
some.
• Scaled out vertically and horizontally.
17. • Fantastic community. #cassandra on Twitter
• Easy to read documentation
• Linearly scalable. Easy to grow cluster.
• Low maintenance overhead for ops team.
• Handles time series data very well.
23. • User’s feed is comprised of entities with one set
of actions
• User’s feed only contains one of any given entity
• An entity’s set of actions contains up to seven of
the most recent actions taken by that user’s
network
28. • App/cluster in production before anything works
• Test real life load
• Fail spectacularly without anybody noticing
• Deploy risky changes without fear
• Run alongside MongoDB
30. Query Patterns
• “Create your data models based on the queries
you want to run” - Basically Everybody
• Wanted to…
• Read a user’s feed entities by type and time of
most recent action…separately.
• Write/Update a user’s feed entities with new
actions while knowing only user id and entity id
32. –Mark Dunphy, January 2015
“An UPDATE in Cassandra works like an
UPSERT! Let’s store the user’s entire feed in a
single row in a table! It’s so simple!”
First Data Model
33. CREATE TYPE activity.action (
created_on timestamp,
secondary_entity_id int,
actor_id int,
verb_id int
);
CREATE TYPE activity.entity (
entity_type_id int,
entity_id int
);
38. –Mark Dunphy, January 2015
“Okay let’s keep nearly the same model, but
use INSERT and DELETE instead of always
UPDATE. Just use batch statements.”
Second Data Model
40. • Lose the benefit of Cassandra being distributed
• All queries go through the same coordinator
which puts a lot of stress and responsibility on
one node.
• Use concurrency and prepared statements
instead. Datastax drivers make this easy.
Second Data Model
49. Write Strategy
• “User A comments on Project A. User B follows
User A.”
• Request out to add the comment action to User
B’s feed
• Read existing actions for that entity (Project A) in
B’s feed. Push new action on top.
• Write new actions list into new “row” in projects
table
50. Read Strategy
• SELECT * FROM projects WHERE user_id
= 123 AND created_on > 123214373
• Optimized for quick/easy reads. More important
that a user’s feed loads quickly than it updating
quickly.
• Use timestamp to “page” through data.
51. Lessons Learned
• Duplicate your data to achieve desired queries.
Storage is cheap. Writes are cheap.
• Think outside the box. Cassandra is not
relational.
• Never ever ever ignore inserts/deletes in favor of
an update only workflow. Never. It is literally
insane.
52. Final Specs
• 16 node cluster on AWS EC2 c3.8xlarge
• Mix of SizeTieredCompactionStrategy and
DateTieredCompactionStrategy
• NetworkTopologyStrategy
• Replication factor 3
• ConsistencyLevel = ONE for most requests
53. Final Specs
• Bursty write volume. Consistent read volume.
• 5k to 80k writes per second
• 2k to 4k reads per second