SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
BETTER GRAPHITE 
STORAGE WITH CYANITE 
PIERRE-YVES RITSCHARD 
@PYR 
#CASSANDRASUMMIT 
0
@PYR 
CTO at exoscale, the safe home for your cloud applications 
Open source developer: pithos, cyanite, riemann, collectd… 
Recovering Operations Engineer
AIM OF THIS TALK 
Presenting graphite and its ecosystem 
Presenting cyanite 
Show-casing simplicity through cassandra
OUTLINE 
Graphite overview 
The problem with graphite 
Cyanite solutions & internals 
Looking forward
GRAPHITE OVERVIEW
FROM THE SITE 
Graphite does two things: 
1. Store numeric time-series data 
2. Render graphs of this data on demand 
http://graphite.readthedocs.org
SCOPE 
A metrics tool 
Not a complete monitoring solution 
Interacts with metric submission tools
WHY ARE METRICS IMPORTANT 
Outside the scope of this talk 
Narrowing the gap between map and territory
GRAPHITE COMPONENTS 
whisper 
carbon 
graphite-web
WHISPER 
RRD like storage library 
Written in python 
Each file contains different roll-up periods and an aggregation 
method
CARBON 
Asynchronous (twisted) TCP and UDP service to input time-series 
data 
Simple storage rules 
Split across several daemons
CARBON-CACHE 
Main carbon daemon 
Temporarily caches values to RAM 
Writes out to whisper
CARBON-AGGREGATOR 
Aggregates data and forwards to carbon-cache 
Less I/O strain on the filesystem 
At the expense of resolution
CARBON-RELAY 
Provides sharding and replication 
Forwards to appropriate carbon-cache processes based on a 
provided hashing method
GRAPHITE-WEB 
Simple Django-Based HTTP api 
Persists configuration to SQL 
Data query and manipulation through a very simple DSL 
Graph rendering 
Composer client interface to build graphs 
## sum CPU values 
sumSeries("collectd.web01.cpu-*") 
## provide memory percentage 
alias(asPercent(web01.mem.used, sumSeries(web01.mem.*)), "mem percent")
SCREENSHOTS
SCREENSHOTS
ARCHITECTURE OVERVIEW
MODULARITY IN GRAPHITE 
Recently improved 
A module can implement a storage strategy for graphite-web 
Carbon modularity is a bit harder
THE GRAPHITE ECOSYSTEM 
A wealth of tools are now graphite compatible
STATSD 
Very popular metric service to integrate within applications. 
Aggregates events in n second windows 
Ships off to graphite 
statsd.increment 'session.open' 
statsd.gauge 'session.active', 370 
statsd.timing 'pdf.convert', 320
COLLECTD 
Very popular collection daemon with a graphite destination 
Every conceivable system metrics 
A wealth of additional metric sources (such as a fast statsd 
server) 
<plugin write_graphite> 
<carbon> 
Host "graphite-host" 
</carbon> 
</plugin>
GRAPHITE-API 
Alternative to graphite-web 
Shares data manipulation code 
No persistence of configuration
GRAFANA 
Increasingly popular alternative to graphite-web, with 
graphite-api 
Inspired by the kibana project for logstash 
Optional persistence to elasticsearch for configuration
RIEMANN 
Distributed system monitoring solution 
(def graph! (graphite {:host "graphite-server"})) 
(streams 
(where (service "http.404") 
(rate 5 
graph!)))
AND A LOT MORE 
syslog-ng 
logstash 
descartes 
tasseo 
jmxtrans
HIGH VALUE PROJECT 
Active and friendly developer community 
Growing ecosystem 
Very few contenders
THE PROBLEM WITH GRAPHITE
ESSENTIALY A SINGLE-HOST SOLUTION 
Built in a day where cacti reigned 
Innovative project at the time which decoupled collection 
from storage and display
THE WHISPER FILE FORMAT 
One file per data point 
Optimized for space, not speed 
Plenty of seeks 
Only shared storage option is NFS… 
In many ways can be seen as RRD in python
SCALING STRATEGIES 
Tacked on after the fact 
The decoupled architecture means that both graphite-web 
and carbon need upfront knowledge on the locations of shard
SCALING OVERVIEW
IT GETS A BIT HAIRY 
Cluster topology must be stored on all nodes 
Manual replication mechanism (through carbon-relay) 
Changing cluster topology means re-assigning shards by 
hand
WHAT GRAPHITE CAN KEEP 
Persistence of configuration 
Local data manipulation
WHAT GRAPHITE WOULD NEED 
Automatic shard assignment 
Replication 
Easy management 
Easy cluster topology changes (horizontal scalability)
THE CYANITE APPROACH 
Leveraging Apache Cassandra to store time-series 
Leveraging Graphite for the interface
A CASSANDRA-BACKED CARBON REPLACEMENT 
Written in clojure 
Async I/O 
No more whisper files 
Fast storage 
Horizontally scalable 
Interfaced with graphite-web through graphite-cyanite
CYANITE DUTIES 
Providing graphite-compatible input methods (carbon 
listeners) 
Providing a way to retrieve metric names and metric time-series 
Implemented as two protocols 
A metric-store 
A path-store 
The rest is up to the graphite eco-system, through graphite-cyanite 
The recommended companion is graphite-api
GETTING UP AND RUNNING 
A simple configuration file 
carbon: 
host: "127.0.0.1" 
port: 2003 
readtimeout: 30 
rollups: 
- period: 60480 
rollup: 10 
- period: 105120 
rollup: 600 
http: 
host: "0.0.0.0" 
port: 8080 
logging: 
level: info 
files: 
- "/var/log/cyanite/cyanite.log" 
store: 
cluster: 'localhost' 
keyspace: 'metric'
GRAPHITE-CYANITE 
with graphite-web: 
STORAGE_FINDERS = ( 'cyanite.CyaniteFinder', ) 
CYANITE_URLS = ( 'http://host:port', ) 
with graphite-api: 
cyanite: 
urls: 
- http://cyanite-host:port 
finders: 
- cyanite.CyaniteFinder
LEADING ARCHITECTURE DRIVERS 
Simplicity 
Optimize for speed 
As few moving parts as possible 
Multi-tenancy 
Resource efficiency 
Remain compatible with the graphite ecosystem
CYANITE INTERNALS
CASSANDRA IS GREAT FOR TIME-SERIES 
It bears repeating 
High write to read ratio workload 
No manual shard allocation or reassignment 
Sorted wide columns mean efficient retrieval of data
A NEW STACK
SIMPLE SCHEMA 
CREATE TABLE "metric" ( 
tenant text, 
period int, 
rollup int, 
path text, 
time bigint, 
data list<double>, 
PRIMARY KEY((tenant, period, rollup, path), time) 
)
TAKING ADVANTAGE OF WIDE COLUMNS
LOOKING FORWARD
REPLACING MORE GRAPHITE PARTS, EXTENDING 
FUNCTIONALITY 
Implement graphite's data manipulation functions 
Remove the need for graphite-api or graphite-web when 
using grafana 
Finish providing multi-tenancy options
PICKLE SUPPORT 
Easier integration in existing architectures 
Would allow integration with carbon-relay
ALTERNATIVE INPUT METHODS 
Support queue input of metrics 
Collectd already supports shipping graphite data to Apache 
Kafka 
Support the statsd protocol directly
PROVIDE A CYANITE LIBRARY 
Easy, standard-compliant storage from JVM based 
applications
BATCH OPERATIONS 
Compactions of rolled up series 
Dynamic thresholds 
Great opportunity to leverage the cassandra & spark 
interaction
A FEW TAKE-AWAYS 
Cassandra enabled a quick-win in about 1100 lines of clojure 
Greatly simplified scaling strategy 
Building block for a lot more 
Good way to reduce technology creep if you're already using 
cassandra
THANKS ! 
Cyanite owes a lot to: 
Max Penet (@mpenet) for the great alia library 
Bruno Renie (@brutasse) for graphite-api, graphite-cyanite 
and the initial nudge 
Datastax for the awesome cassandra java-driver 
Its contributors 
Apache Cassandra obviously 
@pyr – #CassandraSummit

Contenu connexe

Tendances

Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformMartin Zapletal
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Alexey Kharlamov
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Omid Vahdaty
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...DataStax
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflixVinay Kumar Chella
 
Spark and spark streaming internals
Spark and spark streaming internalsSpark and spark streaming internals
Spark and spark streaming internalsSigmoid
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa HBaseCon
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
Cassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analyticsCassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analyticsAnirvan Chakraborty
 
Software Development with Apache Cassandra
Software Development with Apache CassandraSoftware Development with Apache Cassandra
Software Development with Apache Cassandrazznate
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsPSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsStephane Manciot
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive Omid Vahdaty
 

Tendances (20)

Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
Spark and spark streaming internals
Spark and spark streaming internalsSpark and spark streaming internals
Spark and spark streaming internals
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Cassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analyticsCassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analytics
 
Software Development with Apache Cassandra
Software Development with Apache CassandraSoftware Development with Apache Cassandra
Software Development with Apache Cassandra
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsPSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 

Similaire à BETTER GRAPHITE STORAGE WITH CYANITE

Collecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsDCollecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsDitnig
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Time series data monitoring at 99acres.com
Time series data monitoring at 99acres.comTime series data monitoring at 99acres.com
Time series data monitoring at 99acres.comRavi Raj
 
Apache Spark™ is here to stay
Apache Spark™ is here to stayApache Spark™ is here to stay
Apache Spark™ is here to stayGiovanna Roda
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Graphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv YafoGraphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv YafoDieter Plaetinck
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataAlexMiowski
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven MicroservicesFabrizio Fortino
 
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...OW2
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformEva Tse
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...DataStax Academy
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationKnoldus Inc.
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationKnoldus Inc.
 
040419 san forum
040419 san forum040419 san forum
040419 san forumThiru Raja
 
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...Mydbops
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 

Similaire à BETTER GRAPHITE STORAGE WITH CYANITE (20)

Collecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsDCollecting metrics with Graphite and StatsD
Collecting metrics with Graphite and StatsD
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Time series data monitoring at 99acres.com
Time series data monitoring at 99acres.comTime series data monitoring at 99acres.com
Time series data monitoring at 99acres.com
 
Apache Spark™ is here to stay
Apache Spark™ is here to stayApache Spark™ is here to stay
Apache Spark™ is here to stay
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Graphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv YafoGraphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv Yafo
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning Data
 
Event Driven Microservices
Event Driven MicroservicesEvent Driven Microservices
Event Driven Microservices
 
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
PEPS: CNES Sentinel Satellite Image Analysis, On-Premises and in the Cloud wi...
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow Presentation
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow Presentation
 
040419 san forum
040419 san forum040419 san forum
040419 san forum
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
Velocity cubes of galaxies
Velocity cubes of galaxiesVelocity cubes of galaxies
Velocity cubes of galaxies
 
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 

Plus de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Dernier

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Dernier (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

BETTER GRAPHITE STORAGE WITH CYANITE

  • 1. BETTER GRAPHITE STORAGE WITH CYANITE PIERRE-YVES RITSCHARD @PYR #CASSANDRASUMMIT 0
  • 2. @PYR CTO at exoscale, the safe home for your cloud applications Open source developer: pithos, cyanite, riemann, collectd… Recovering Operations Engineer
  • 3. AIM OF THIS TALK Presenting graphite and its ecosystem Presenting cyanite Show-casing simplicity through cassandra
  • 4. OUTLINE Graphite overview The problem with graphite Cyanite solutions & internals Looking forward
  • 6. FROM THE SITE Graphite does two things: 1. Store numeric time-series data 2. Render graphs of this data on demand http://graphite.readthedocs.org
  • 7. SCOPE A metrics tool Not a complete monitoring solution Interacts with metric submission tools
  • 8. WHY ARE METRICS IMPORTANT Outside the scope of this talk Narrowing the gap between map and territory
  • 9. GRAPHITE COMPONENTS whisper carbon graphite-web
  • 10. WHISPER RRD like storage library Written in python Each file contains different roll-up periods and an aggregation method
  • 11. CARBON Asynchronous (twisted) TCP and UDP service to input time-series data Simple storage rules Split across several daemons
  • 12. CARBON-CACHE Main carbon daemon Temporarily caches values to RAM Writes out to whisper
  • 13. CARBON-AGGREGATOR Aggregates data and forwards to carbon-cache Less I/O strain on the filesystem At the expense of resolution
  • 14. CARBON-RELAY Provides sharding and replication Forwards to appropriate carbon-cache processes based on a provided hashing method
  • 15. GRAPHITE-WEB Simple Django-Based HTTP api Persists configuration to SQL Data query and manipulation through a very simple DSL Graph rendering Composer client interface to build graphs ## sum CPU values sumSeries("collectd.web01.cpu-*") ## provide memory percentage alias(asPercent(web01.mem.used, sumSeries(web01.mem.*)), "mem percent")
  • 19. MODULARITY IN GRAPHITE Recently improved A module can implement a storage strategy for graphite-web Carbon modularity is a bit harder
  • 20. THE GRAPHITE ECOSYSTEM A wealth of tools are now graphite compatible
  • 21. STATSD Very popular metric service to integrate within applications. Aggregates events in n second windows Ships off to graphite statsd.increment 'session.open' statsd.gauge 'session.active', 370 statsd.timing 'pdf.convert', 320
  • 22. COLLECTD Very popular collection daemon with a graphite destination Every conceivable system metrics A wealth of additional metric sources (such as a fast statsd server) <plugin write_graphite> <carbon> Host "graphite-host" </carbon> </plugin>
  • 23. GRAPHITE-API Alternative to graphite-web Shares data manipulation code No persistence of configuration
  • 24. GRAFANA Increasingly popular alternative to graphite-web, with graphite-api Inspired by the kibana project for logstash Optional persistence to elasticsearch for configuration
  • 25. RIEMANN Distributed system monitoring solution (def graph! (graphite {:host "graphite-server"})) (streams (where (service "http.404") (rate 5 graph!)))
  • 26. AND A LOT MORE syslog-ng logstash descartes tasseo jmxtrans
  • 27. HIGH VALUE PROJECT Active and friendly developer community Growing ecosystem Very few contenders
  • 28. THE PROBLEM WITH GRAPHITE
  • 29. ESSENTIALY A SINGLE-HOST SOLUTION Built in a day where cacti reigned Innovative project at the time which decoupled collection from storage and display
  • 30. THE WHISPER FILE FORMAT One file per data point Optimized for space, not speed Plenty of seeks Only shared storage option is NFS… In many ways can be seen as RRD in python
  • 31. SCALING STRATEGIES Tacked on after the fact The decoupled architecture means that both graphite-web and carbon need upfront knowledge on the locations of shard
  • 33. IT GETS A BIT HAIRY Cluster topology must be stored on all nodes Manual replication mechanism (through carbon-relay) Changing cluster topology means re-assigning shards by hand
  • 34. WHAT GRAPHITE CAN KEEP Persistence of configuration Local data manipulation
  • 35. WHAT GRAPHITE WOULD NEED Automatic shard assignment Replication Easy management Easy cluster topology changes (horizontal scalability)
  • 36. THE CYANITE APPROACH Leveraging Apache Cassandra to store time-series Leveraging Graphite for the interface
  • 37. A CASSANDRA-BACKED CARBON REPLACEMENT Written in clojure Async I/O No more whisper files Fast storage Horizontally scalable Interfaced with graphite-web through graphite-cyanite
  • 38. CYANITE DUTIES Providing graphite-compatible input methods (carbon listeners) Providing a way to retrieve metric names and metric time-series Implemented as two protocols A metric-store A path-store The rest is up to the graphite eco-system, through graphite-cyanite The recommended companion is graphite-api
  • 39. GETTING UP AND RUNNING A simple configuration file carbon: host: "127.0.0.1" port: 2003 readtimeout: 30 rollups: - period: 60480 rollup: 10 - period: 105120 rollup: 600 http: host: "0.0.0.0" port: 8080 logging: level: info files: - "/var/log/cyanite/cyanite.log" store: cluster: 'localhost' keyspace: 'metric'
  • 40. GRAPHITE-CYANITE with graphite-web: STORAGE_FINDERS = ( 'cyanite.CyaniteFinder', ) CYANITE_URLS = ( 'http://host:port', ) with graphite-api: cyanite: urls: - http://cyanite-host:port finders: - cyanite.CyaniteFinder
  • 41. LEADING ARCHITECTURE DRIVERS Simplicity Optimize for speed As few moving parts as possible Multi-tenancy Resource efficiency Remain compatible with the graphite ecosystem
  • 43. CASSANDRA IS GREAT FOR TIME-SERIES It bears repeating High write to read ratio workload No manual shard allocation or reassignment Sorted wide columns mean efficient retrieval of data
  • 45. SIMPLE SCHEMA CREATE TABLE "metric" ( tenant text, period int, rollup int, path text, time bigint, data list<double>, PRIMARY KEY((tenant, period, rollup, path), time) )
  • 46. TAKING ADVANTAGE OF WIDE COLUMNS
  • 48. REPLACING MORE GRAPHITE PARTS, EXTENDING FUNCTIONALITY Implement graphite's data manipulation functions Remove the need for graphite-api or graphite-web when using grafana Finish providing multi-tenancy options
  • 49. PICKLE SUPPORT Easier integration in existing architectures Would allow integration with carbon-relay
  • 50. ALTERNATIVE INPUT METHODS Support queue input of metrics Collectd already supports shipping graphite data to Apache Kafka Support the statsd protocol directly
  • 51. PROVIDE A CYANITE LIBRARY Easy, standard-compliant storage from JVM based applications
  • 52. BATCH OPERATIONS Compactions of rolled up series Dynamic thresholds Great opportunity to leverage the cassandra & spark interaction
  • 53. A FEW TAKE-AWAYS Cassandra enabled a quick-win in about 1100 lines of clojure Greatly simplified scaling strategy Building block for a lot more Good way to reduce technology creep if you're already using cassandra
  • 54. THANKS ! Cyanite owes a lot to: Max Penet (@mpenet) for the great alia library Bruno Renie (@brutasse) for graphite-api, graphite-cyanite and the initial nudge Datastax for the awesome cassandra java-driver Its contributors Apache Cassandra obviously @pyr – #CassandraSummit