Monitoring Cassandra with graphite using Yammer Coda-Hale Library

JMXExpress
Transporting Cassandra Metrics
To Graphite

Cassandra Is Awesome
● No Single Point of Failure
● Fault Tolerant
● Multi-DC Is A Picnic
● Great Properties That Let Ops Teams to
Sleep at 2 AM

Robustness Have Price
● C* Isn’t A Fire and Forget System :(
● Most Times You Don’t Notice Problems
o Things can go up/down for a minutes
o C* Simply Queues Request, and Services Still
Running, but nobody notices

Be Proactive
Do Daily/Weekly Checkups to detect and
prevent Problems:
● Capacity
● Exceptions
● Performance Bottlenecks
● Data Modeling Issues

Reactive
● Something Will Go Wrong:
o Hardware Failures
o Bugs
o Malicious or Non-Malicious Users
● Alarms: NOC, Pager-Duty

Proactive or Reactive?
● You Need Data
o Form Alerts
o Find Anomalies
o Trends
o Debugging
● You Should Monitor Everything

Gathering Metrics
● Cassandra
o OpsCenter
o JMX
o Nodetool
o Logs
● Environment
o CPU, Memory, Disks, Network, …
o Logs
o JVM

Give Data Context
You Should Give the
Data Context …
Otherwise it’s just pretty
Graphs...

JMX
● Java Management Extensions
● Complex…
● Resources are presented as Objects with
Attributes
● Used for Both Monitoring and For Actions

Native JMX
● Un-Friendly way to get metrics
o Requires Java
o Slow and have memory leaks
o Nightmare for Ops (Network/Security)
Client Cassandra
Init Port 7199
Reply
Hostname:Port
7199
1- Get new
7199
host/port
2- Drop old conn
3- Connect with
new host/port 1024-65536
Init Port 7199

JMX Tools
● Visual
o JConsole
o VisualVM
o Commercial
● Command Line
o jmxterm
o jmxsh
● Jolokia
● MX4J

JMX Syntax
[domain]:[key1]=[value1],[key2]=[value2] …
org.apache.cassandra.metrics:type=ColumnFamily,keyspace=outbrain,scope=user_events,name=TotalDiskSpaceUsed

JMX Domains
org.apache.cassandra
● db
● internal
● net
● request
org.apache.cassandra.metrics

JMX Types
org.apache.cassandra.metrics: type=
● Cache
● Client
● ClientRequest
● ClientRequestMetrics
● ColumnFamily
● CommitLog
● Compaction
● DroppedMessages
● FileCache
● Storage
● ThreadPools

Coda-Hale Metrics
● Toolkit called metrics from metrics
o By Yammer Coda-Hale Library
● Easy to Use
● Easy to Read (If you speak Java)
● Popular

Types of Metrics
● Gauge: Instantaneous value
● Counter: number that can be
incremented/decremented
● Meter: Rate of Events Over time
(request/second/minutes/5min/15min)
● Histogram: Statistical Distribution
o 50,75,95,98,99,99.9 percentile
o average/median/min/max/stddev
● Timer:rate of events/historgram of
duration

75th percentile is 650.75 us
(75% took 650.75us or less)
One Minute Write rate is
13,915 per second

Native JMX
● Its overwhelming at first
● Hard to tell what they mean with the source
● Moves around a lot between versions
● Fortunately there is nodetool

Coda-Hale Reporting Interface
Coda-Hale Metrics Library:
● Default
o JMX
o Console
o CSV
o Slf4J
● Addons
o Ganglia / Graphite
● Community
o Cassandra / StatsD / NewRelic / Splunk / Cloudwatch
o Kafka / Riemann / TempDB/ Munin / Riak / InfluxDB / Sematext
o MongoDB / OpenTSDB/ Librato
o … More

Reporting Interface Activation
● Metrics library:
o Included in Cassandra since 1.1
o Pre 2.0 It required writing your Java agent reporter

Pluggable Metrics in Cassandra 2.0.2
● Starting from Cassandra 2.0.2, you need only to configure special YAML
file:
/etc/cassandra/metrics-reporter-config-graphite.yaml
● Load the Coda-Hale metrics by including the build-in agent in the
cassandra-env.sh file
-Dcassandra.metricsReporterConfigFile=yourCoolFile.yaml
● Save the file in /etc/cassandra/ directory only and don’t specify full path,
otherwise it will not work

Pluggable Metrics in Cassandra 2.0.2
Yaml Example:
graphite:
-
period: 60
timeunit: 'SECONDS'
hosts:
- host: 'graphite'
port: 2003
predicate:
color: "white"
useQualifiedName: true
patterns:
- "ôrg.apache.cassandra.metrics.Cache.+"
- "ôrg.apache.cassandra.metrics.ClientRequest.+"
- "ôrg.apache.cassandra.metrics.Storage.+"
- "ôrg.apache.cassandra.metrics.ThreadPools.+"

Caveats of Pluggable Metrics
- Works only in 2.0.2 or higher
- Has bad metrics names: sometimes begins
with ‘.’ and not suitable for Graphite Tree
- Limited ability to manipulate metrics

Our Approach
- Use older version (2.0.3) of Metrics Library
that fits to all C* version (down to 1.1)
- Write our own Java agent for backward
compatibility
- Run the metrics via Manipulator daemon to
be able for reformat them and fit them to our
dashboards

The Java Agent
From the Documentation

The Java Agent
● Compiling it:
javac -cp $CASSANDRA_HOME/lib/metrics-core-2.0.3.jar:$CASSANDRA_HOME/lib/metrics-graphite-2.0.3.jar
com/datastax/example/ReportAgent.java
$ jar -cfM reporter.jar .
● Loading the Agent with Cassandra
(Edit cassandra-env.sh and add the following line to the bottom)
JVM_OPTS="-javaagent:/path/to/your/reporter.jar $JVM_OPTS"

Manipulating the Metrics
● Metrics comes in org.apache.cassandra…
syntax
● They don’t fit into our Graphite Scheme
● Some metrics begins with . (dot)
● Need to be able to filter and manipulate
metrics

Manipulating the Metrics
We have build a Simple Bash script that poses
to a Graphite server and manipulates the
metrics as we wish:
● We change the prefix
● We can filter metrics
● Keep unified output
● Solve some syntax issues like IP addresses
read by Graphite as separate metric tree

Metrics in Graphite (Sample: Write Latency Histograms)

Monitoring Cassandra with graphite using Yammer Coda-Hale Library

Monitoring Cassandra with graphite using Yammer Coda-Hale Library

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à Monitoring Cassandra with graphite using Yammer Coda-Hale Library

Similaire à Monitoring Cassandra with graphite using Yammer Coda-Hale Library (20)

Dernier

Dernier (20)

Monitoring Cassandra with graphite using Yammer Coda-Hale Library