Slides used for my presentation to the Austin Cassandra Meetup where I discuss how Cassandra fits in to Rackspace Cloud Monitoring.
Hint: It's just a small part.
4. CM Overview
Thousands of servers
Pre-existing solutions
Lessons learned from
Cloudkick
Internal versus external
Millions of checks
http://www.flickr.com/photos/jean_koulev/2697677595/
12. Control
Cluster
Metadata
State
Three datacenters
High RF
Wide rows
Easy dump & load
https://github.com/racker/cassandra-syncer
13. Data Model
Rich but simple
Objects used together stored together
Simple parent-child relations
One row per customer (tenant)
Composite column names
19. Control Cluster
API server is Node.js
Javascript ORM library
• Define object model in JS
• Read/write entire objects
• Never think about CQL
node-cassandra-client
https://github.com/racker/node-cassandra-client
25. Rollup Concepts
Slot (Range)
• Pegged at 4032 slots
• One slot is a range of seconds (varies
with granularity)
• metrics_locator CF
• Key is granularity name + slot num
• Columns index keys in rollup tables
27. Full Resolution! Arrival
• time, name, several metrics
• metric = name, type, value
• Compute locator and slot
• Insert metrics col=timestamp,
value=encoded metric
• Single Cassandra APPLY BATCH;
28. Rollups
• Two types
– Rollup all metrics from timeX to timeY
– Rollup a single metric from timeX to timeY
– Times may span multiple slots (ranges)
• Use rollups to produce rollups
– E.g.: use 20m data points to create 60m point.
– Store number of data points with rollup
29. Rollups
• Gotchas!
– Do not want to rollup a coarse range when finer
range that feeds data to it is scheduled for rollup
shortly
60m | | | …
20m | | | | | | | |…
.
5m |||||||||||||||||||||||||||||…
– Mind the “tail” during datapoint queries (calculate
rollups on the fly)
30. It Scales
Rollup operations are idempotent*
Simplifies availability
Rollups are easily parallelized
Hash partition the locator space
31. But…
What if data arrives after rollup
is performed?
More than 24hrs late: don’t
care, forget it
Else treat normally: slots are
scheduled for rollups as they
age