SlideShare a Scribd company logo
1 of 22
CASSANDRA &
BENCHMARKING
A holistic perspective
Agenda
1. This presentation is related to performance benchmarks
for Cassandra based systems
2. Discuss benchmarking in general
3. Define and Approach
4. Explore gotchas and things to look out for
5. Hear from you! (Prizes for best benchmarking stories)
Benchmarking
• Benchmark testing is the process of load testing a
component or an entire end to end IT system to determine
the performance characteristics of the application.
Benchmarking Properties
• Should be repeatable
• Should capture performance measurements from
successive runs
• Ideally there should be low variance between successive
tests
• Should highlight improvements or degradation in system
changes
Modern Systems
• More often than not distributed.
• Many different types of system components
• Complex performance constraints
• What is Easily Measured? Network, CPU, Memory, I/O
Utilisation
• More Difficult: Tech. Specific Factors, e.g. Cassandra –
impact of compaction, read performance
Justification for Benchmarking
• Simple:
• Is the system going to keep performing the more users there are?
• Complex:
• Cost Reduction
• Optimisation
• Growth Projection
• TCO
APPROACH
Caveats
• The more information you have the better…
• Any investment in systemic testing is generally a good
investment
• Simplify the goals/outcomes for business
• Automate as much as possible and formalise test
procedure to ensure adherence to quality measures.
• As interested in percentiles as well as mean values
Requirements
• Discover resource constraints
• Discover modes of failure
• To guarantee operation outside of usual parameters
• Ensure SLAs are being met
• Ensure operation over longer periods is consistent.
Basic Approach
• Distinguish component benchmark from system
benchmark.
• Component benchmark is important, defines a basic SLA
for inter component operations.
• A system is sum of all parts, not just each component :
Component performance does not imply system
performance.
• Take corrective action from the bottom up (network,
hardware, compute resources) as well as from the top
down (API design, data access patterns).
Holistic Approach
• The system exists to service business requirements, work
backwards from them.
• Define our benchmark from user perspective.
• Technical goals + business goals must align.
• The system must function in its entirety, it is not sufficient
to performance test each component in isolation.
1. Define a Basic Traffic Model
• Example - Simple Storefront
• GET /product/list (50%)
• GET /product/{id} (20%)
• POST /product/{id}/order (20%)
• GET /orders/list (10%)
2. Define a User Profile
• User Type 1
• Browse heavy
• GET /product/list (70%)
• GET /product/{id} (20%)
• POST /product/{id}/order (5%)
• GET /orders/list (5%)
• User Type 2
• Compulsive buyers
• GET /product/list (30%)
• GET /product/{id} (20%)
• POST /product/{id}/order (30%)
• GET /orders/list (20%)
Peak Periods?
• Adding an hourly activity allows for a more useful
benchmark.
• Can be expressed as active user count.
• Very simple to assign a probability to the number of each
type of user on the system at that time.
• E.g. 20% type 1, 80% type 2.
• The ideal circumstance is to use real data for these
models if any is available.
• Distributed load drivers coordinate to meet the hourly user
count.
Peak Periods?
0
2000
4000
6000
8000
10000
12000
14000
16000
0 2 4 6 8 10 12 14 16 18 20 22
Hour
Active Users
Tooling
• Jmeter
• The Grinder
• Jolokia (JMX)
• Logstash / Statsd
• Codahale Metrics
• Graphite (Visualisation)
• Iostat / dstat, iftop, netstat, htop, etc.
• cassandra-stress (useful for a basic sanity check)
CASSANDRA
Specifics
Considerations
• Cassandra’s append only writes mean writes are always
consistently fast given sufficient resources
• Compaction has a different impact depending on the
strategy you use (STCS lighter than LCS).
• Pending compactions tend to backup more during load
oriented testing
• Reads have a significant impact depending on:
• Spread of column mutations across SSTables
• Compaction strategy (STCS less efficient for above than LCS)
• No. of reads for same row key (whether we are exercising the key
cache or not)
• Our consistency level (same for writes)
Common Issues
• Poor query design (unbounded queries, abuse of ALLOW
FILTERING), anti-patterns.
• Poor capacity planning, disk, memory, cpu etc.
• Many failed requests on coordinators may lead to
resources being over-used for hinted handoff.
• If a node is memory constrained you may get JVM pauses
due to garbage collection
• Poor network connectivity and incorrect consistency
levels may lead to more timeouts.
• It is possible to have hotspots in Cassandra if you have
not modelled keys correctly.
What to collect during test?
• Read / Write latency per CF (nodetool cfstats)
• No. of reads / writes (nodetool cfstats)
• No. of pending compactions
• Thread Pool usage, especially pending (nodetool tipstats)
• Correlate with
• Disk i/o
• CPU
• Memory usage
• Visualise as much as possible and use overlays for
correlation.
Points to Remember
• Latency reported by Cassandra is internal, so only useful
to tell if Cassandra I/O is performing adequately. Graph it
to get most value or use OpsCentre.
• Add metrics at every tier in your system, make sure it is
possible to correlate the above number with latency in
other parts of the system.
• Soak testing is critical with Cassandra as empty system
performance may be very different as disk utilization /
compaction requirements grow.
• Experiment with settings for easy gains. Some CFs may
benefit from RowCache.
YOUR STORIES
Best two stories get books from O’ Reilly

More Related Content

Similar to Cassandra Applications Benchmarking

performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfMAshok10
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!Richard Robinson
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko Neotys
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Testplant
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018Rohan Rasane
 
Performance Testing
Performance TestingPerformance Testing
Performance TestingAnu Shaji
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applicationsGR8Conf
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 
Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Derek Ashmore
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldStéphane Dorrekens
 
05. performance-concepts-26-slides
05. performance-concepts-26-slides05. performance-concepts-26-slides
05. performance-concepts-26-slidesMuhammad Ahad
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
Art of Cloud Workload Translation
Art of Cloud Workload TranslationArt of Cloud Workload Translation
Art of Cloud Workload TranslationPaul Cooper
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
 
PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed SystemsPAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed SystemsJames Hill
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Lari Hotari
 
Cqrs and Event Sourcing Intro For Developers
Cqrs and Event Sourcing Intro For DevelopersCqrs and Event Sourcing Intro For Developers
Cqrs and Event Sourcing Intro For Developerswojtek_s
 
Владимир Бронников (Senior .NET Developer, Perfectial) “Performance optimizat...
Владимир Бронников (Senior .NET Developer, Perfectial) “Performance optimizat...Владимир Бронников (Senior .NET Developer, Perfectial) “Performance optimizat...
Владимир Бронников (Senior .NET Developer, Perfectial) “Performance optimizat...DataArt
 

Similar to Cassandra Applications Benchmarking (20)

performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
 
Linux basics
Linux basicsLinux basics
Linux basics
 
Performance Testing
Performance TestingPerformance Testing
Performance Testing
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15
 
Dynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the fieldDynamics CRM high volume systems - lessons from the field
Dynamics CRM high volume systems - lessons from the field
 
05. performance-concepts-26-slides
05. performance-concepts-26-slides05. performance-concepts-26-slides
05. performance-concepts-26-slides
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Art of Cloud Workload Translation
Art of Cloud Workload TranslationArt of Cloud Workload Translation
Art of Cloud Workload Translation
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed SystemsPAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
 
Cqrs and Event Sourcing Intro For Developers
Cqrs and Event Sourcing Intro For DevelopersCqrs and Event Sourcing Intro For Developers
Cqrs and Event Sourcing Intro For Developers
 
Владимир Бронников (Senior .NET Developer, Perfectial) “Performance optimizat...
Владимир Бронников (Senior .NET Developer, Perfectial) “Performance optimizat...Владимир Бронников (Senior .NET Developer, Perfectial) “Performance optimizat...
Владимир Бронников (Senior .NET Developer, Perfectial) “Performance optimizat...
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Cassandra Applications Benchmarking

  • 2. Agenda 1. This presentation is related to performance benchmarks for Cassandra based systems 2. Discuss benchmarking in general 3. Define and Approach 4. Explore gotchas and things to look out for 5. Hear from you! (Prizes for best benchmarking stories)
  • 3. Benchmarking • Benchmark testing is the process of load testing a component or an entire end to end IT system to determine the performance characteristics of the application.
  • 4. Benchmarking Properties • Should be repeatable • Should capture performance measurements from successive runs • Ideally there should be low variance between successive tests • Should highlight improvements or degradation in system changes
  • 5. Modern Systems • More often than not distributed. • Many different types of system components • Complex performance constraints • What is Easily Measured? Network, CPU, Memory, I/O Utilisation • More Difficult: Tech. Specific Factors, e.g. Cassandra – impact of compaction, read performance
  • 6. Justification for Benchmarking • Simple: • Is the system going to keep performing the more users there are? • Complex: • Cost Reduction • Optimisation • Growth Projection • TCO
  • 8. Caveats • The more information you have the better… • Any investment in systemic testing is generally a good investment • Simplify the goals/outcomes for business • Automate as much as possible and formalise test procedure to ensure adherence to quality measures. • As interested in percentiles as well as mean values
  • 9. Requirements • Discover resource constraints • Discover modes of failure • To guarantee operation outside of usual parameters • Ensure SLAs are being met • Ensure operation over longer periods is consistent.
  • 10. Basic Approach • Distinguish component benchmark from system benchmark. • Component benchmark is important, defines a basic SLA for inter component operations. • A system is sum of all parts, not just each component : Component performance does not imply system performance. • Take corrective action from the bottom up (network, hardware, compute resources) as well as from the top down (API design, data access patterns).
  • 11. Holistic Approach • The system exists to service business requirements, work backwards from them. • Define our benchmark from user perspective. • Technical goals + business goals must align. • The system must function in its entirety, it is not sufficient to performance test each component in isolation.
  • 12. 1. Define a Basic Traffic Model • Example - Simple Storefront • GET /product/list (50%) • GET /product/{id} (20%) • POST /product/{id}/order (20%) • GET /orders/list (10%)
  • 13. 2. Define a User Profile • User Type 1 • Browse heavy • GET /product/list (70%) • GET /product/{id} (20%) • POST /product/{id}/order (5%) • GET /orders/list (5%) • User Type 2 • Compulsive buyers • GET /product/list (30%) • GET /product/{id} (20%) • POST /product/{id}/order (30%) • GET /orders/list (20%)
  • 14. Peak Periods? • Adding an hourly activity allows for a more useful benchmark. • Can be expressed as active user count. • Very simple to assign a probability to the number of each type of user on the system at that time. • E.g. 20% type 1, 80% type 2. • The ideal circumstance is to use real data for these models if any is available. • Distributed load drivers coordinate to meet the hourly user count.
  • 15. Peak Periods? 0 2000 4000 6000 8000 10000 12000 14000 16000 0 2 4 6 8 10 12 14 16 18 20 22 Hour Active Users
  • 16. Tooling • Jmeter • The Grinder • Jolokia (JMX) • Logstash / Statsd • Codahale Metrics • Graphite (Visualisation) • Iostat / dstat, iftop, netstat, htop, etc. • cassandra-stress (useful for a basic sanity check)
  • 18. Considerations • Cassandra’s append only writes mean writes are always consistently fast given sufficient resources • Compaction has a different impact depending on the strategy you use (STCS lighter than LCS). • Pending compactions tend to backup more during load oriented testing • Reads have a significant impact depending on: • Spread of column mutations across SSTables • Compaction strategy (STCS less efficient for above than LCS) • No. of reads for same row key (whether we are exercising the key cache or not) • Our consistency level (same for writes)
  • 19. Common Issues • Poor query design (unbounded queries, abuse of ALLOW FILTERING), anti-patterns. • Poor capacity planning, disk, memory, cpu etc. • Many failed requests on coordinators may lead to resources being over-used for hinted handoff. • If a node is memory constrained you may get JVM pauses due to garbage collection • Poor network connectivity and incorrect consistency levels may lead to more timeouts. • It is possible to have hotspots in Cassandra if you have not modelled keys correctly.
  • 20. What to collect during test? • Read / Write latency per CF (nodetool cfstats) • No. of reads / writes (nodetool cfstats) • No. of pending compactions • Thread Pool usage, especially pending (nodetool tipstats) • Correlate with • Disk i/o • CPU • Memory usage • Visualise as much as possible and use overlays for correlation.
  • 21. Points to Remember • Latency reported by Cassandra is internal, so only useful to tell if Cassandra I/O is performing adequately. Graph it to get most value or use OpsCentre. • Add metrics at every tier in your system, make sure it is possible to correlate the above number with latency in other parts of the system. • Soak testing is critical with Cassandra as empty system performance may be very different as disk utilization / compaction requirements grow. • Experiment with settings for easy gains. Some CFs may benefit from RowCache.
  • 22. YOUR STORIES Best two stories get books from O’ Reilly