SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Why does my choice of storage matter with
Cassandra?
Johnny Miller, Solutions Architect
@CyanMiller
www.linkedin.com/in/johnnymiller
Quote
“The single biggest predictor of success or failure
with a Cassandra deployment is in storage choice”
Patrick McFadin, Chief Evangelist for Cassandra, @PatrickMcFadin
©2014 DataStax Confidential. Do not distribute without consent. 2
Cassandra Storage Engine
©2014 DataStax Confidential. Do not distribute without consent. 3
Inserts/Updates
©2014 DataStax Confidential. Do not distribute without consent. 4
Memtables are organized in sorted
order by row key and flushed to
SSTables sequentially (Read/Write)
Ordered Map of KVP,
(Immutable, Read Only)
Append only file structure,
providing interim durability for
writes before they get flushed to
SSTables (Write Only)
Reads
©2014 DataStax Confidential. Do not distribute without consent. 5
Deletes
•  Unlike most DBs, deleted data is not immediately
removed from disk.
•  A marker called a tombstone is written to indicate the the
column is deleted
•  A tombstones exist for a configurable period of time, and
are only deleted from disk via compaction after that time
has expired.
•  Deletes are just as fast as inserts J
©2014 DataStax Confidential. Do not distribute without consent. 6
Compaction
•  Regular compaction of data in Cassandra is essential for a healthy and performant
cluster.
•  SSTables are immutable
•  Get rid or duplicate/overwritten data
•  Drop deleted data and tomnstones
•  Data in SSTables is sorted by partition key, the effect of which is that while the
SSTables are being consolidated, the disk I/O is not random.
©2014 DataStax Confidential. Do not distribute without consent. 7
Compaction Strategies
•  There is a choice of three strategies Cassandra can use for compaction
and all have different disk I/O profiles and capacity requirements.
•  SizeTieredCompactionStrategy (default)
•  Using this strategy causes bursts in I/O activity while a compaction is in process
•  These I/O bursts can negatively affect read-heavy workloads, but typically do not impact
write performance.
•  Data highly likely to be spread across multiple SSTables i.e. multiple disk seeks
•  LeveledCompactionStrategy
•  ~90% of the time, data will be in only a single SSTable i.e. minimal disk seeks
•  However, there is significantly higher Disk I/O than size tiered compaction in order to
guarantee how many SSTables data may be spread across
•  Due to high disk I/O rarely appropriate for on traditional HDD
•  DateTieredCompactionStrategy (C* 2.0.11+ and 2.1.1+)
•  Stores data written within a certain period of time in the same SSTable.
•  Can store data that has been set to expire using TTL in an SSTable with other data
scheduled to expire at approximately – can just drop the SSTable without any
compaction!
©2014 DataStax Confidential. Do not distribute without consent. 8
Choice of storage matters
•  Most databases rewrite modified data in place and writes are
buffered and then flushed to disk as random writes.
•  With Cassandra:
•  Disk writes are typically sequential append only
operations
•  On-disk tables are written in a sorted order so compaction
running time increases linearly with the amount of data
•  So choice of storage is pretty important!!
©2014 DataStax Confidential. Do not distribute without consent. 9
Disks and Configuration Options
©2014 DataStax Confidential. Do not distribute without consent. 10
Quote
“For many applications, we are no longer constrained by hard drive
capacity, but by seek speeds.
Essentially, a 7200 RPM hard drive is capable of delivering approximately
100 seeks per second, and this has not changed in more than 10 years,
even as disk capacity has been doubling every 18–24 months.
In fact, if you have a big data application which required a half a petabyte of
storage, what had previously required 1024 disk drives when we were using
500 GB drives, now that 3 TB disks are available, only 171 disk drives are
needed.
So a storage array capable of storing half a petabyte is now capable of 80%
fewer seeks.”
- Ted Ts’o Maintainer of the ext4 file system in the Linux kernel
©2014 DataStax Confidential. Do not distribute without consent. 11
Hard Drive/Spinning Disk
©2014 DataStax Confidential. Do not distribute without consent. 12
This part actually has to move!This bit spins around very fast
So what can we do?
•  Memory?
•  Caching can help, but the hit rate has to be extremely high to mitigate the mechanical
latency of spinning disks
•  Get rid of the moving parts!
•  Mechanical media will never be able to keep up under load
•  Today’s databases service multiple users with difference access patterns
•  A relatively small number of concurrent disk reads can result in seconds of latency
•  SSDs don’t have moving parts
•  SSDs can eliminate entire classes of problems
•  With Cassandra in particular, you will save a lot of money on staff resources by investing
in SSDs up front
•  Compactions can be tough on flash, but it’s not as bad as you think
©2014 DataStax Confidential. Do not distribute without consent. 13
The best way to do it – SSDs!
•  What is an SSD?
•  Solid State Drive
•  Bits stored in NAND Flash Memory
•  No moving parts
•  “Seeks” 2-3 orders of magnitude faster than spinning disk
•  What’s the catch?
•  Smaller capacity
•  More expensive
•  Flash wears out
In practice, this is not a problem – if it makes you nervous, keep spares
©2014 DataStax Confidential. Do not distribute without consent. 14
The IO Scheduler
•  NOOP - use this scheduler if you know another IO device (like a RAID
card) will be doing its own IO scheduling. The NOOP scheduler is just
a pass-through.
•  Deadline - otherwise, use the deadline scheduler
•  Tell the OS the drive is non-rotational
•  Tune read-ahead way down – start at 0 and work your way up
Don’t forget to tune the OS for SSDs
©2014 DataStax Confidential. Do not distribute without consent. 15
echo deadline > /sys/block/<drive>/queue/scheduler
echo 0 > /sys/block/sda/queue/rotational
blockdev –setra 0 /dev/<drive>
Use SSDs
•  More flexibility and substantial performance benefit
•  Typically 10x the performance for less than 2x the cost (potentially lower) when compared
with HDDs.
•  You can use LeveledCompactionStrategy
•  SSD drives can scale up to cope with larger compaction overheads while simultaneously
serving many random reads.
•  Netflix found that they could half the total system cost to achieve the same level of
throughput.
•  Additionally the mean read request latency was reduced from 10ms to 2.2ms and 99th
percentile request latency was reduced from 65ms to 10ms.
•  http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html.
©2014 DataStax Confidential. Do not distribute without consent. 16
The worst way to do it
•  Shared Storage
•  Cassandra is a shared-nothing architecture with no single point of failure
•  Adding shared storage adds a single point of failure
•  Irrespective of the terrible performance – this alone is enough reason not to do
it.
©2014 DataStax Confidential. Do not distribute without consent. 17
Shared
Storage
Local storage configuration
•  RAID or JBOD?
•  RAID – Redundant Array of (Independepent/Inexpensive) Devices
•  JBOD – Just a Bunch Of Disks
•  RAID
•  Common Cassandra RAID levels are 0, 1, 10
•  RAID-0 is most common, but means all data on node must be rebuilt
from other nodes when a drive fails
•  JBOD
•  Drives are listed individually in cassandra.yaml
•  Failed drives can be replaces individually
©2014 DataStax Confidential. Do not distribute without consent. 18
How to choose between RAID or JBOD?
•  Performance
•  For SSDs, not so much…
•  Compactions are usually throttled significantly below bus speed
•  So a single SSD usually has sufficient throughput
•  Throughput is really the only advantage RAID buys
•  Manageability
•  Pick the option that best fits the deployment scenario
•  If using SSD, and drives can be replaced, choose JBOD
•  Otherwise, RAID is probably the right choice.
©2014 DataStax Confidential. Do not distribute without consent. 19
How to choose between RAID or JBOD?
•  Cloud provider
•  EC2 ephemeral SSD can’t be replaced, use RAID (and dont use EBS)
•  GCE persistent SSD volumes can be replaced, JBOD is useful
•  Not all drives are hot swapable
•  PCIe devices can’t conveniently be replaced
•  SSD Spares for JBOD mode
•  Keep spare SSDs online, but not in use
•  Allows the node to be easily brought back online
with a quick config change
©2014 DataStax Confidential. Do not distribute without consent. 20
Comparison Data
©2014 DataStax Confidential. Do not distribute without consent. 21
FusionIO ioDrive II
©2014 DataStax Confidential. Do not distribute without consent. 22
Reads
Writes
Latency (microseconds)
PNY XLR8 SSD (consumer grade MLC)
©2014 DataStax Confidential. Do not distribute without consent. 23
Samsung 840 Pro SSD (consumer grade MLC)
©2014 DataStax Confidential. Do not distribute without consent. 24
7200RPM SATA
©2014 DataStax Confidential. Do not distribute without consent. 25
7200RPM SAS
©2014 DataStax Confidential. Do not distribute without consent. 26
10K SATA
©2014 DataStax Confidential. Do not distribute without consent. 27
15K SAS
©2014 DataStax Confidential. Do not distribute without consent. 28
All Drives
©2014 DataStax Confidential. Do not distribute without consent. 29
All SSDs
©2014 DataStax Confidential. Do not distribute without consent. 30
Conclusion
•  Using SSDs is a good idea
•  Better response times
•  Less variance in performance
•  Significantly higher throughput so fewer servers needed
©2014 DataStax Confidential. Do not distribute without consent. 31
VS
Thank You
We power the big data apps
that transform business.

Contenu connexe

Tendances

Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...ScyllaDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview EMC
 
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overviewnomathjobs
 
Data Federation with Apache Spark
Data Federation with Apache SparkData Federation with Apache Spark
Data Federation with Apache SparkDataWorks Summit
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashCeph Community
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance TuningJeremy Leisy
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFSDataWorks Summit
 

Tendances (20)

Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...
How to Secure Your Scylla Deployment: Authorization, Encryption, LDAP Authent...
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
 
FreeBSD and Hardening Web Server
FreeBSD and Hardening Web ServerFreeBSD and Hardening Web Server
FreeBSD and Hardening Web Server
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overview
 
Spark
SparkSpark
Spark
 
Data Federation with Apache Spark
Data Federation with Apache SparkData Federation with Apache Spark
Data Federation with Apache Spark
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Storage basics
Storage basicsStorage basics
Storage basics
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
 

En vedette

Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State DrivesRick Branson
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State DrivesDataStax Academy
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Cassandra Day SV 2014: Designing Commodity Storage in Apache CassandraCassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Cassandra Day SV 2014: Designing Commodity Storage in Apache CassandraDataStax Academy
 
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015datastaxjp
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big DataDataStax Academy
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
A deep look at the cql where clause
A deep look at the cql where clauseA deep look at the cql where clause
A deep look at the cql where clauseBenjamin Lerer
 
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...MongoDB
 
Mongo Performance Optimization Using Indexing
Mongo Performance Optimization Using IndexingMongo Performance Optimization Using Indexing
Mongo Performance Optimization Using IndexingChinmay Naik
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In CassandraEric Evans
 
Overcoming Scaling Challenges in MongoDB Deployments with SSD
Overcoming Scaling Challenges in MongoDB Deployments with SSDOvercoming Scaling Challenges in MongoDB Deployments with SSD
Overcoming Scaling Challenges in MongoDB Deployments with SSDMongoDB
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
Webinar: MongoDB 2.6 New Security Features
Webinar: MongoDB 2.6 New Security FeaturesWebinar: MongoDB 2.6 New Security Features
Webinar: MongoDB 2.6 New Security FeaturesMongoDB
 
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...Vlad Savitsky
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Spark Summit
 

En vedette (20)

Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Cassandra Day SV 2014: Designing Commodity Storage in Apache CassandraCassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
 
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
A deep look at the cql where clause
A deep look at the cql where clauseA deep look at the cql where clause
A deep look at the cql where clause
 
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
 
Mongo Performance Optimization Using Indexing
Mongo Performance Optimization Using IndexingMongo Performance Optimization Using Indexing
Mongo Performance Optimization Using Indexing
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In Cassandra
 
Overcoming Scaling Challenges in MongoDB Deployments with SSD
Overcoming Scaling Challenges in MongoDB Deployments with SSDOvercoming Scaling Challenges in MongoDB Deployments with SSD
Overcoming Scaling Challenges in MongoDB Deployments with SSD
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Webinar: MongoDB 2.6 New Security Features
Webinar: MongoDB 2.6 New Security FeaturesWebinar: MongoDB 2.6 New Security Features
Webinar: MongoDB 2.6 New Security Features
 
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
 
Linux Kernel I/O Schedulers
Linux Kernel I/O SchedulersLinux Kernel I/O Schedulers
Linux Kernel I/O Schedulers
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 

Similaire à Why does my choice of storage matter with cassandra?

Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...DataStax
 
Dell SSD og Flash teknologi i SAN
Dell SSD og Flash teknologi i SANDell SSD og Flash teknologi i SAN
Dell SSD og Flash teknologi i SANKenneth de Brucq
 
eFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder Webinar — Big News: Get Ready for Next-Gen BDReFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder Webinar — Big News: Get Ready for Next-Gen BDReFolder
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Community
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Community
 
5 Tips for a More Efficient Data Center
5 Tips for a More Efficient Data Center5 Tips for a More Efficient Data Center
5 Tips for a More Efficient Data CenterWestern Digital
 
VMworld 2014: Databases in a Virtualized World
VMworld 2014:  Databases in a Virtualized WorldVMworld 2014:  Databases in a Virtualized World
VMworld 2014: Databases in a Virtualized WorldViolin Memory
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld
 
OOW13: It's a solid state-world
OOW13: It's a solid state-worldOOW13: It's a solid state-world
OOW13: It's a solid state-worldMarc Fielding
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red_Hat_Storage
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructurexKinAnx
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructuresolarisyourep
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red_Hat_Storage
 
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsSeagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsRed_Hat_Storage
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 

Similaire à Why does my choice of storage matter with cassandra? (20)

Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
Dell SSD og Flash teknologi i SAN
Dell SSD og Flash teknologi i SANDell SSD og Flash teknologi i SAN
Dell SSD og Flash teknologi i SAN
 
IaaS for DBAs in Azure
IaaS for DBAs in AzureIaaS for DBAs in Azure
IaaS for DBAs in Azure
 
eFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder Webinar — Big News: Get Ready for Next-Gen BDReFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder Webinar — Big News: Get Ready for Next-Gen BDR
 
Ssd collab13
Ssd   collab13Ssd   collab13
Ssd collab13
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
 
5 Tips for a More Efficient Data Center
5 Tips for a More Efficient Data Center5 Tips for a More Efficient Data Center
5 Tips for a More Efficient Data Center
 
SSD-Bondi.pptx
SSD-Bondi.pptxSSD-Bondi.pptx
SSD-Bondi.pptx
 
VMworld 2014: Databases in a Virtualized World
VMworld 2014:  Databases in a Virtualized WorldVMworld 2014:  Databases in a Virtualized World
VMworld 2014: Databases in a Virtualized World
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
 
OOW13: It's a solid state-world
OOW13: It's a solid state-worldOOW13: It's a solid state-world
OOW13: It's a solid state-world
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsSeagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 

Plus de Johnny Miller

201504 securing cassandraanddse
201504 securing cassandraanddse201504 securing cassandraanddse
201504 securing cassandraanddseJohnny Miller
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Johnny Miller
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Johnny Miller
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
 
Going native with Apache Cassandra
Going native with Apache CassandraGoing native with Apache Cassandra
Going native with Apache CassandraJohnny Miller
 

Plus de Johnny Miller (6)

201504 securing cassandraanddse
201504 securing cassandraanddse201504 securing cassandraanddse
201504 securing cassandraanddse
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache Cassandra
 
Going native with Apache Cassandra
Going native with Apache CassandraGoing native with Apache Cassandra
Going native with Apache Cassandra
 

Dernier

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Why does my choice of storage matter with cassandra?

  • 1. Why does my choice of storage matter with Cassandra? Johnny Miller, Solutions Architect @CyanMiller www.linkedin.com/in/johnnymiller
  • 2. Quote “The single biggest predictor of success or failure with a Cassandra deployment is in storage choice” Patrick McFadin, Chief Evangelist for Cassandra, @PatrickMcFadin ©2014 DataStax Confidential. Do not distribute without consent. 2
  • 3. Cassandra Storage Engine ©2014 DataStax Confidential. Do not distribute without consent. 3
  • 4. Inserts/Updates ©2014 DataStax Confidential. Do not distribute without consent. 4 Memtables are organized in sorted order by row key and flushed to SSTables sequentially (Read/Write) Ordered Map of KVP, (Immutable, Read Only) Append only file structure, providing interim durability for writes before they get flushed to SSTables (Write Only)
  • 5. Reads ©2014 DataStax Confidential. Do not distribute without consent. 5
  • 6. Deletes •  Unlike most DBs, deleted data is not immediately removed from disk. •  A marker called a tombstone is written to indicate the the column is deleted •  A tombstones exist for a configurable period of time, and are only deleted from disk via compaction after that time has expired. •  Deletes are just as fast as inserts J ©2014 DataStax Confidential. Do not distribute without consent. 6
  • 7. Compaction •  Regular compaction of data in Cassandra is essential for a healthy and performant cluster. •  SSTables are immutable •  Get rid or duplicate/overwritten data •  Drop deleted data and tomnstones •  Data in SSTables is sorted by partition key, the effect of which is that while the SSTables are being consolidated, the disk I/O is not random. ©2014 DataStax Confidential. Do not distribute without consent. 7
  • 8. Compaction Strategies •  There is a choice of three strategies Cassandra can use for compaction and all have different disk I/O profiles and capacity requirements. •  SizeTieredCompactionStrategy (default) •  Using this strategy causes bursts in I/O activity while a compaction is in process •  These I/O bursts can negatively affect read-heavy workloads, but typically do not impact write performance. •  Data highly likely to be spread across multiple SSTables i.e. multiple disk seeks •  LeveledCompactionStrategy •  ~90% of the time, data will be in only a single SSTable i.e. minimal disk seeks •  However, there is significantly higher Disk I/O than size tiered compaction in order to guarantee how many SSTables data may be spread across •  Due to high disk I/O rarely appropriate for on traditional HDD •  DateTieredCompactionStrategy (C* 2.0.11+ and 2.1.1+) •  Stores data written within a certain period of time in the same SSTable. •  Can store data that has been set to expire using TTL in an SSTable with other data scheduled to expire at approximately – can just drop the SSTable without any compaction! ©2014 DataStax Confidential. Do not distribute without consent. 8
  • 9. Choice of storage matters •  Most databases rewrite modified data in place and writes are buffered and then flushed to disk as random writes. •  With Cassandra: •  Disk writes are typically sequential append only operations •  On-disk tables are written in a sorted order so compaction running time increases linearly with the amount of data •  So choice of storage is pretty important!! ©2014 DataStax Confidential. Do not distribute without consent. 9
  • 10. Disks and Configuration Options ©2014 DataStax Confidential. Do not distribute without consent. 10
  • 11. Quote “For many applications, we are no longer constrained by hard drive capacity, but by seek speeds. Essentially, a 7200 RPM hard drive is capable of delivering approximately 100 seeks per second, and this has not changed in more than 10 years, even as disk capacity has been doubling every 18–24 months. In fact, if you have a big data application which required a half a petabyte of storage, what had previously required 1024 disk drives when we were using 500 GB drives, now that 3 TB disks are available, only 171 disk drives are needed. So a storage array capable of storing half a petabyte is now capable of 80% fewer seeks.” - Ted Ts’o Maintainer of the ext4 file system in the Linux kernel ©2014 DataStax Confidential. Do not distribute without consent. 11
  • 12. Hard Drive/Spinning Disk ©2014 DataStax Confidential. Do not distribute without consent. 12 This part actually has to move!This bit spins around very fast
  • 13. So what can we do? •  Memory? •  Caching can help, but the hit rate has to be extremely high to mitigate the mechanical latency of spinning disks •  Get rid of the moving parts! •  Mechanical media will never be able to keep up under load •  Today’s databases service multiple users with difference access patterns •  A relatively small number of concurrent disk reads can result in seconds of latency •  SSDs don’t have moving parts •  SSDs can eliminate entire classes of problems •  With Cassandra in particular, you will save a lot of money on staff resources by investing in SSDs up front •  Compactions can be tough on flash, but it’s not as bad as you think ©2014 DataStax Confidential. Do not distribute without consent. 13
  • 14. The best way to do it – SSDs! •  What is an SSD? •  Solid State Drive •  Bits stored in NAND Flash Memory •  No moving parts •  “Seeks” 2-3 orders of magnitude faster than spinning disk •  What’s the catch? •  Smaller capacity •  More expensive •  Flash wears out In practice, this is not a problem – if it makes you nervous, keep spares ©2014 DataStax Confidential. Do not distribute without consent. 14
  • 15. The IO Scheduler •  NOOP - use this scheduler if you know another IO device (like a RAID card) will be doing its own IO scheduling. The NOOP scheduler is just a pass-through. •  Deadline - otherwise, use the deadline scheduler •  Tell the OS the drive is non-rotational •  Tune read-ahead way down – start at 0 and work your way up Don’t forget to tune the OS for SSDs ©2014 DataStax Confidential. Do not distribute without consent. 15 echo deadline > /sys/block/<drive>/queue/scheduler echo 0 > /sys/block/sda/queue/rotational blockdev –setra 0 /dev/<drive>
  • 16. Use SSDs •  More flexibility and substantial performance benefit •  Typically 10x the performance for less than 2x the cost (potentially lower) when compared with HDDs. •  You can use LeveledCompactionStrategy •  SSD drives can scale up to cope with larger compaction overheads while simultaneously serving many random reads. •  Netflix found that they could half the total system cost to achieve the same level of throughput. •  Additionally the mean read request latency was reduced from 10ms to 2.2ms and 99th percentile request latency was reduced from 65ms to 10ms. •  http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html. ©2014 DataStax Confidential. Do not distribute without consent. 16
  • 17. The worst way to do it •  Shared Storage •  Cassandra is a shared-nothing architecture with no single point of failure •  Adding shared storage adds a single point of failure •  Irrespective of the terrible performance – this alone is enough reason not to do it. ©2014 DataStax Confidential. Do not distribute without consent. 17 Shared Storage
  • 18. Local storage configuration •  RAID or JBOD? •  RAID – Redundant Array of (Independepent/Inexpensive) Devices •  JBOD – Just a Bunch Of Disks •  RAID •  Common Cassandra RAID levels are 0, 1, 10 •  RAID-0 is most common, but means all data on node must be rebuilt from other nodes when a drive fails •  JBOD •  Drives are listed individually in cassandra.yaml •  Failed drives can be replaces individually ©2014 DataStax Confidential. Do not distribute without consent. 18
  • 19. How to choose between RAID or JBOD? •  Performance •  For SSDs, not so much… •  Compactions are usually throttled significantly below bus speed •  So a single SSD usually has sufficient throughput •  Throughput is really the only advantage RAID buys •  Manageability •  Pick the option that best fits the deployment scenario •  If using SSD, and drives can be replaced, choose JBOD •  Otherwise, RAID is probably the right choice. ©2014 DataStax Confidential. Do not distribute without consent. 19
  • 20. How to choose between RAID or JBOD? •  Cloud provider •  EC2 ephemeral SSD can’t be replaced, use RAID (and dont use EBS) •  GCE persistent SSD volumes can be replaced, JBOD is useful •  Not all drives are hot swapable •  PCIe devices can’t conveniently be replaced •  SSD Spares for JBOD mode •  Keep spare SSDs online, but not in use •  Allows the node to be easily brought back online with a quick config change ©2014 DataStax Confidential. Do not distribute without consent. 20
  • 21. Comparison Data ©2014 DataStax Confidential. Do not distribute without consent. 21
  • 22. FusionIO ioDrive II ©2014 DataStax Confidential. Do not distribute without consent. 22 Reads Writes Latency (microseconds)
  • 23. PNY XLR8 SSD (consumer grade MLC) ©2014 DataStax Confidential. Do not distribute without consent. 23
  • 24. Samsung 840 Pro SSD (consumer grade MLC) ©2014 DataStax Confidential. Do not distribute without consent. 24
  • 25. 7200RPM SATA ©2014 DataStax Confidential. Do not distribute without consent. 25
  • 26. 7200RPM SAS ©2014 DataStax Confidential. Do not distribute without consent. 26
  • 27. 10K SATA ©2014 DataStax Confidential. Do not distribute without consent. 27
  • 28. 15K SAS ©2014 DataStax Confidential. Do not distribute without consent. 28
  • 29. All Drives ©2014 DataStax Confidential. Do not distribute without consent. 29
  • 30. All SSDs ©2014 DataStax Confidential. Do not distribute without consent. 30
  • 31. Conclusion •  Using SSDs is a good idea •  Better response times •  Less variance in performance •  Significantly higher throughput so fewer servers needed ©2014 DataStax Confidential. Do not distribute without consent. 31 VS
  • 32. Thank You We power the big data apps that transform business.