Presenter: Pierre-Yves Ritschard, CTO at Exoscale
Graphite is the go-to tool of sysadmins everywhere to store and retrieve timeseries data. Cyanite is an alternative graphite compatible daemon which uses Cassandra as its main storage engine. The talk will focus on how to build efficient time-series data models in Cassandra, how the ecosystem of tools around Cassandra can help in processing timeseries in batches and will provide architectural insight in how to build truly scalable time series pipelines.
2. @PYR
CTO at exoscale, the safe home for your cloud applications
Open source developer: pithos, cyanite, riemann, collectd…
Recovering Operations Engineer
3. AIM OF THIS TALK
Presenting graphite and its ecosystem
Presenting cyanite
Show-casing simplicity through cassandra
14. CARBON-RELAY
Provides sharding and replication
Forwards to appropriate carbon-cache processes based on a
provided hashing method
15. GRAPHITE-WEB
Simple Django-Based HTTP api
Persists configuration to SQL
Data query and manipulation through a very simple DSL
Graph rendering
Composer client interface to build graphs
## sum CPU values
sumSeries("collectd.web01.cpu-*")
## provide memory percentage
alias(asPercent(web01.mem.used, sumSeries(web01.mem.*)), "mem percent")
21. STATSD
Very popular metric service to integrate within applications.
Aggregates events in n second windows
Ships off to graphite
statsd.increment 'session.open'
statsd.gauge 'session.active', 370
statsd.timing 'pdf.convert', 320
22. COLLECTD
Very popular collection daemon with a graphite destination
Every conceivable system metrics
A wealth of additional metric sources (such as a fast statsd
server)
<plugin write_graphite>
<carbon>
Host "graphite-host"
</carbon>
</plugin>
24. GRAFANA
Increasingly popular alternative to graphite-web, with
graphite-api
Inspired by the kibana project for logstash
Optional persistence to elasticsearch for configuration
29. ESSENTIALY A SINGLE-HOST SOLUTION
Built in a day where cacti reigned
Innovative project at the time which decoupled collection
from storage and display
30. THE WHISPER FILE FORMAT
One file per data point
Optimized for space, not speed
Plenty of seeks
Only shared storage option is NFS…
In many ways can be seen as RRD in python
31. SCALING STRATEGIES
Tacked on after the fact
The decoupled architecture means that both graphite-web
and carbon need upfront knowledge on the locations of shard
33. IT GETS A BIT HAIRY
Cluster topology must be stored on all nodes
Manual replication mechanism (through carbon-relay)
Changing cluster topology means re-assigning shards by
hand
34. WHAT GRAPHITE CAN KEEP
Persistence of configuration
Local data manipulation
35. WHAT GRAPHITE WOULD NEED
Automatic shard assignment
Replication
Easy management
Easy cluster topology changes (horizontal scalability)
36. THE CYANITE APPROACH
Leveraging Apache Cassandra to store time-series
Leveraging Graphite for the interface
37. A CASSANDRA-BACKED CARBON REPLACEMENT
Written in clojure
Async I/O
No more whisper files
Fast storage
Horizontally scalable
Interfaced with graphite-web through graphite-cyanite
38. CYANITE DUTIES
Providing graphite-compatible input methods (carbon
listeners)
Providing a way to retrieve metric names and metric time-series
Implemented as two protocols
A metric-store
A path-store
The rest is up to the graphite eco-system, through graphite-cyanite
The recommended companion is graphite-api
41. LEADING ARCHITECTURE DRIVERS
Simplicity
Optimize for speed
As few moving parts as possible
Multi-tenancy
Resource efficiency
Remain compatible with the graphite ecosystem
43. CASSANDRA IS GREAT FOR TIME-SERIES
It bears repeating
High write to read ratio workload
No manual shard allocation or reassignment
Sorted wide columns mean efficient retrieval of data
48. REPLACING MORE GRAPHITE PARTS, EXTENDING
FUNCTIONALITY
Implement graphite's data manipulation functions
Remove the need for graphite-api or graphite-web when
using grafana
Finish providing multi-tenancy options
49. PICKLE SUPPORT
Easier integration in existing architectures
Would allow integration with carbon-relay
50. ALTERNATIVE INPUT METHODS
Support queue input of metrics
Collectd already supports shipping graphite data to Apache
Kafka
Support the statsd protocol directly
51. PROVIDE A CYANITE LIBRARY
Easy, standard-compliant storage from JVM based
applications
52. BATCH OPERATIONS
Compactions of rolled up series
Dynamic thresholds
Great opportunity to leverage the cassandra & spark
interaction
53. A FEW TAKE-AWAYS
Cassandra enabled a quick-win in about 1100 lines of clojure
Greatly simplified scaling strategy
Building block for a lot more
Good way to reduce technology creep if you're already using
cassandra
54. THANKS !
Cyanite owes a lot to:
Max Penet (@mpenet) for the great alia library
Bruno Renie (@brutasse) for graphite-api, graphite-cyanite
and the initial nudge
Datastax for the awesome cassandra java-driver
Its contributors
Apache Cassandra obviously
@pyr – #CassandraSummit