BETTER GRAPHITE STORAGE WITH CYANITE

BETTER GRAPHITE
STORAGE WITH CYANITE
PIERRE-YVES RITSCHARD
@PYR
#CASSANDRASUMMIT
0

@PYR
CTO at exoscale, the safe home for your cloud applications
Open source developer: pithos, cyanite, riemann, collectd…
Recovering Operations Engineer

AIM OF THIS TALK
Presenting graphite and its ecosystem
Presenting cyanite
Show-casing simplicity through cassandra

OUTLINE
Graphite overview
The problem with graphite
Cyanite solutions & internals
Looking forward

FROM THE SITE
Graphite does two things:
1. Store numeric time-series data
2. Render graphs of this data on demand
http://graphite.readthedocs.org

SCOPE
A metrics tool
Not a complete monitoring solution
Interacts with metric submission tools

WHY ARE METRICS IMPORTANT
Outside the scope of this talk
Narrowing the gap between map and territory

GRAPHITE COMPONENTS
whisper
carbon
graphite-web

WHISPER
RRD like storage library
Written in python
Each file contains different roll-up periods and an aggregation
method

CARBON
Asynchronous (twisted) TCP and UDP service to input time-series
data
Simple storage rules
Split across several daemons

CARBON-CACHE
Main carbon daemon
Temporarily caches values to RAM
Writes out to whisper

CARBON-AGGREGATOR
Aggregates data and forwards to carbon-cache
Less I/O strain on the filesystem
At the expense of resolution

CARBON-RELAY
Provides sharding and replication
Forwards to appropriate carbon-cache processes based on a
provided hashing method

GRAPHITE-WEB
Simple Django-Based HTTP api
Persists configuration to SQL
Data query and manipulation through a very simple DSL
Graph rendering
Composer client interface to build graphs
## sum CPU values
sumSeries("collectd.web01.cpu-*")
## provide memory percentage
alias(asPercent(web01.mem.used, sumSeries(web01.mem.*)), "mem percent")

MODULARITY IN GRAPHITE
Recently improved
A module can implement a storage strategy for graphite-web
Carbon modularity is a bit harder

THE GRAPHITE ECOSYSTEM
A wealth of tools are now graphite compatible

STATSD
Very popular metric service to integrate within applications.
Aggregates events in n second windows
Ships off to graphite
statsd.increment 'session.open'
statsd.gauge 'session.active', 370
statsd.timing 'pdf.convert', 320

COLLECTD
Very popular collection daemon with a graphite destination
Every conceivable system metrics
A wealth of additional metric sources (such as a fast statsd
server)
<plugin write_graphite>
<carbon>
Host "graphite-host"
</carbon>
</plugin>

GRAPHITE-API
Alternative to graphite-web
Shares data manipulation code
No persistence of configuration

GRAFANA
Increasingly popular alternative to graphite-web, with
graphite-api
Inspired by the kibana project for logstash
Optional persistence to elasticsearch for configuration

RIEMANN
Distributed system monitoring solution
(def graph! (graphite {:host "graphite-server"}))
(streams
(where (service "http.404")
(rate 5
graph!)))

AND A LOT MORE
syslog-ng
logstash
descartes
tasseo
jmxtrans

HIGH VALUE PROJECT
Active and friendly developer community
Growing ecosystem
Very few contenders

ESSENTIALY A SINGLE-HOST SOLUTION
Built in a day where cacti reigned
Innovative project at the time which decoupled collection
from storage and display

THE WHISPER FILE FORMAT
One file per data point
Optimized for space, not speed
Plenty of seeks
Only shared storage option is NFS…
In many ways can be seen as RRD in python

SCALING STRATEGIES
Tacked on after the fact
The decoupled architecture means that both graphite-web
and carbon need upfront knowledge on the locations of shard

IT GETS A BIT HAIRY
Cluster topology must be stored on all nodes
Manual replication mechanism (through carbon-relay)
Changing cluster topology means re-assigning shards by
hand

WHAT GRAPHITE CAN KEEP
Persistence of configuration
Local data manipulation

WHAT GRAPHITE WOULD NEED
Automatic shard assignment
Replication
Easy management
Easy cluster topology changes (horizontal scalability)

THE CYANITE APPROACH
Leveraging Apache Cassandra to store time-series
Leveraging Graphite for the interface

A CASSANDRA-BACKED CARBON REPLACEMENT
Written in clojure
Async I/O
No more whisper files
Fast storage
Horizontally scalable
Interfaced with graphite-web through graphite-cyanite

CYANITE DUTIES
Providing graphite-compatible input methods (carbon
listeners)
Providing a way to retrieve metric names and metric time-series
Implemented as two protocols
A metric-store
A path-store
The rest is up to the graphite eco-system, through graphite-cyanite
The recommended companion is graphite-api

GETTING UP AND RUNNING
A simple configuration file
carbon:
host: "127.0.0.1"
port: 2003
readtimeout: 30
rollups:
- period: 60480
rollup: 10
- period: 105120
rollup: 600
http:
host: "0.0.0.0"
port: 8080
logging:
level: info
files:
- "/var/log/cyanite/cyanite.log"
store:
cluster: 'localhost'
keyspace: 'metric'

GRAPHITE-CYANITE
with graphite-web:
STORAGE_FINDERS = ( 'cyanite.CyaniteFinder', )
CYANITE_URLS = ( 'http://host:port', )
with graphite-api:
cyanite:
urls:
- http://cyanite-host:port
finders:
- cyanite.CyaniteFinder

LEADING ARCHITECTURE DRIVERS
Simplicity
Optimize for speed
As few moving parts as possible
Multi-tenancy
Resource efficiency
Remain compatible with the graphite ecosystem

CASSANDRA IS GREAT FOR TIME-SERIES
It bears repeating
High write to read ratio workload
No manual shard allocation or reassignment
Sorted wide columns mean efficient retrieval of data

SIMPLE SCHEMA
CREATE TABLE "metric" (
tenant text,
period int,
rollup int,
path text,
time bigint,
data list<double>,
PRIMARY KEY((tenant, period, rollup, path), time)
)

TAKING ADVANTAGE OF WIDE COLUMNS

REPLACING MORE GRAPHITE PARTS, EXTENDING
FUNCTIONALITY
Implement graphite's data manipulation functions
Remove the need for graphite-api or graphite-web when
using grafana
Finish providing multi-tenancy options

PICKLE SUPPORT
Easier integration in existing architectures
Would allow integration with carbon-relay

ALTERNATIVE INPUT METHODS
Support queue input of metrics
Collectd already supports shipping graphite data to Apache
Kafka
Support the statsd protocol directly

PROVIDE A CYANITE LIBRARY
Easy, standard-compliant storage from JVM based
applications

BATCH OPERATIONS
Compactions of rolled up series
Dynamic thresholds
Great opportunity to leverage the cassandra & spark
interaction

A FEW TAKE-AWAYS
Cassandra enabled a quick-win in about 1100 lines of clojure
Greatly simplified scaling strategy
Building block for a lot more
Good way to reduce technology creep if you're already using
cassandra

THANKS !
Cyanite owes a lot to:
Max Penet (@mpenet) for the great alia library
Bruno Renie (@brutasse) for graphite-api, graphite-cyanite
and the initial nudge
Datastax for the awesome cassandra java-driver
Its contributors
Apache Cassandra obviously
@pyr – #CassandraSummit

BETTER GRAPHITE STORAGE WITH CYANITE

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à BETTER GRAPHITE STORAGE WITH CYANITE

Similaire à BETTER GRAPHITE STORAGE WITH CYANITE (20)

Plus de DataStax Academy

Plus de DataStax Academy (20)

Dernier

Dernier (20)

BETTER GRAPHITE STORAGE WITH CYANITE