SlideShare une entreprise Scribd logo
1  sur  67
Télécharger pour lire hors ligne
Cassandra Summit 1.0
    Performance Tuning


      Brandon Williams

           Riptano, Inc.
    brandon@riptano.com
 brandonwilliams@apache.org
             @faltering
        driftx on freenode

       August 10, 2010




  Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent
              You could put the commitlog on the ephemeral drive anyway,
              at the price of durability
              But then, why have a commitlog at all?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent
              You could put the commitlog on the ephemeral drive anyway,
              at the price of durability
              But then, why have a commitlog at all?
              Maybe you can disable it in 0.7/0.8




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent
              You could put the commitlog on the ephemeral drive anyway,
              at the price of durability
              But then, why have a commitlog at all?
              Maybe you can disable it in 0.7/0.8
          Realservers: one RAID array, bad RAID options




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Making writes faster


      Use a separate IO device for the commit log.
          Hard to accomplish in the cloud
          Rackspace: one IO device, but it’s persistent (RAID array
          underneath)
          EC2: EBS is slow, local disk is impersistent
              You could put the commitlog on the ephemeral drive anyway,
              at the price of durability
              But then, why have a commitlog at all?
              Maybe you can disable it in 0.7/0.8
          Realservers: one RAID array, bad RAID options
          Will anyone ever offer SSDs?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


What else?




     concurrent writers (concurrent readers for
     reads)
        increase if you have lots of cores




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


What else?




     concurrent writers (concurrent readers for
     reads)
        increase if you have lots of cores
     memtable flush writers
        increase if you have lots of IO




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


What are all these options?




      memtable throughput in mb
      memtable operations in millions
      memtable flush after mins
      bigger memtables improve writes?




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


What are all these options?




      memtable throughput in mb
      memtable operations in millions
      memtable flush after mins
      bigger memtables improve writes?
          no, but they can improve reads




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


What are all these options?




      memtable throughput in mb
      memtable operations in millions
      memtable flush after mins
      bigger memtables improve writes?
          no, but they can improve reads
          what?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                     Tuning Reads


Compaction: the slayer of reads




                  Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Compaction: the slayer of reads



      a necessary evil




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes
          reducing the priority affects CPU usage, not IO




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes
          reducing the priority affects CPU usage, not IO
      avoid reading from slow hosts




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes
          reducing the priority affects CPU usage, not IO
      avoid reading from slow hosts
          dynamic snitch




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                          Tuning Reads


Compaction: the slayer of reads



      a necessary evil
      IO contention hell
      you can reduce compaction priority in 0.6.4 or later
          -Dcassandra.compaction.priority=1
          constantly outstripping it means you need more nodes
          reducing the priority affects CPU usage, not IO
      avoid reading from slow hosts
          dynamic snitch
               accrual failure detector




                       Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites
          less sstables makes for more efficient compaction




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites
          less sstables makes for more efficient compaction
      if you are write once then read-only, you *could* turn it off




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites
          less sstables makes for more efficient compaction
      if you are write once then read-only, you *could* turn it off
          merge-on-read and bloomfilters save you




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Compaction (con’t)




      bigger memtables absorb more overwrites
          less sstables makes for more efficient compaction
      if you are write once then read-only, you *could* turn it off
          merge-on-read and bloomfilters save you
          someday, you’ll want to repair




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                    Tuning Reads


Know your read pattern




                 Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Know your read pattern




      how much data is in the working set?




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Know your read pattern




      how much data is in the working set?
      disk is slow: you want that in memory




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Know your read pattern




      how much data is in the working set?
      disk is slow: you want that in memory
          sometimes you can’t afford the cost




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Know your read pattern




      how much data is in the working set?
      disk is slow: you want that in memory
          sometimes you can’t afford the cost
      how many reads are repeats?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Know your read pattern




      how much data is in the working set?
      disk is slow: you want that in memory
          sometimes you can’t afford the cost
      how many reads are repeats?
      doing lots of random IO within a row?
          column index size in kb




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                            Tuning Reads


Caches




         Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


Caches


     on a cold hit, each row requires two seeks




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                       Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index




                    Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards
         make fat rows




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards
         make fat rows
         but not too fat, since the row is the unit of distribution




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards
         make fat rows
         but not too fat, since the row is the unit of distribution
     the OS file cache




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caches


     on a cold hit, each row requires two seeks
     one to find the row’s position in the index
         key cache eliminates this
     another to read the row
         row cache eliminates this, too
     columns in the row are contiguous afterwards
         make fat rows
         but not too fat, since the row is the unit of distribution
     the OS file cache
         use a good OS




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                    Tuning Reads


Caching Strategies




                 Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure
          trying to use 32GB of row cache will wreck you




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure
          trying to use 32GB of row cache will wreck you
          estimating the correct size can be difficult
               use the average row size in cfstats as a starting point
               in 0.7, each SSTable has a persistent row size histogram
               the penalty for being wrong can be catastrophic: OOM
               can’t be done programmatically in Java, or Cassandra would
               do it for you
               this is why you can’t set an absolute amount in bytes



                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                          Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure
          trying to use 32GB of row cache will wreck you
          estimating the correct size can be difficult
               use the average row size in cfstats as a starting point
               in 0.7, each SSTable has a persistent row size histogram
               the penalty for being wrong can be catastrophic: OOM
               can’t be done programmatically in Java, or Cassandra would
               do it for you
               this is why you can’t set an absolute amount in bytes
          if you enable on it very fat rows, it can be bad

                       Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                          Tuning Reads


Caching Strategies
      key cache
          excellent bang for your buck
          half your seeks are gone
          a lot of keys fit in a relatively small amount of memory
      row cache
          all seeks are gone
          but more heap usage = more GC pressure
          trying to use 32GB of row cache will wreck you
          estimating the correct size can be difficult
               use the average row size in cfstats as a starting point
               in 0.7, each SSTable has a persistent row size histogram
               the penalty for being wrong can be catastrophic: OOM
               can’t be done programmatically in Java, or Cassandra would
               do it for you
               this is why you can’t set an absolute amount in bytes
          if you enable on it very fat rows, it can be bad
               keep your indexes in a different column family
                       Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only
               why do you have swap enabled, anyway?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only
               why do you have swap enabled, anyway?




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only
               why do you have swap enabled, anyway?
      Absolute numbers vs percentages
          percentages can be an OOM time bomb
          harder to calculate how much memory the cache will use




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)



      OS file cache: it’s free
          no size estimation needed
          mmap is great
               unless it makes you swap
               switch to mmap index only
               why do you have swap enabled, anyway?
      Absolute numbers vs percentages
          percentages can be an OOM time bomb
          harder to calculate how much memory the cache will use




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)


      lookup order:
          row cache
          key cache
          disk (file cache?)




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)


      lookup order:
          row cache
          key cache
          disk (file cache?)
      sizing your caches:
          large key cache
          smaller row cache for very hot rows
          leave the rest to the OS




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)


      lookup order:
          row cache
          key cache
          disk (file cache?)
      sizing your caches:
          large key cache
          smaller row cache for very hot rows
          leave the rest to the OS
      don’t make your heap larger than needed




                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                         Tuning Reads


Caching Strategies (con’t)


      lookup order:
          row cache
          key cache
          disk (file cache?)
      sizing your caches:
          large key cache
          smaller row cache for very hot rows
          leave the rest to the OS
      don’t make your heap larger than needed
      monitor hit rates via JMX
          actually, monitor everything you can



                      Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                    Tuning Reads


Test, Measure, Tweak, Repeat




                 Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Test, Measure, Tweak, Repeat




      use stress.py as a baseline
          make sure you have multiprocessing




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                        Tuning Reads


Test, Measure, Tweak, Repeat




      use stress.py as a baseline
          make sure you have multiprocessing
      move to real world data




                     Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                     Tuning Reads


Settings you don’t need to touch




      commitlog rotation threshold in mb
      SlicedBufferSizeInKB
      FlushIndexBufferSizeInMB




                  Brandon Williams   Cassandra Summit 1.0
Tuning Writes
                                  Tuning Reads


The End




  Questions?




               Brandon Williams   Cassandra Summit 1.0

Contenu connexe

En vedette

Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1DataStax Academy
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in CassandraShogo Hoshii
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)kakugawa
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.Ambiente Livre
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataGuido Schmutz
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...DataStax
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...DataStax
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark datastaxjp
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Markus Höfer
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...DataStax
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cachergrebski
 
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyterdata science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & JupyterRaj Singh
 

En vedette (20)

Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)
 
Devstack
DevstackDevstack
Devstack
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
Django Heresies
Django HeresiesDjango Heresies
Django Heresies
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyterdata science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
 

Similaire à Cassandra Summit 2010 Performance Tuning

Le Top 10 des Best Practices pour SQL Server
Le Top 10 des Best Practices pour SQL ServerLe Top 10 des Best Practices pour SQL Server
Le Top 10 des Best Practices pour SQL ServerMicrosoft Technet France
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central paJoseph D'Antoni
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordJAXLondon_Conference
 
Demystifying Storage - Building large SANs
Demystifying  Storage - Building large SANsDemystifying  Storage - Building large SANs
Demystifying Storage - Building large SANsDirecti Group
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLYoshinori Matsunobu
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanJoseph D'Antoni
 
Lustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable StorageLustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable StorageElizabeth Ciabattari
 
MongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo BoulderMongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo BoulderJustin Smestad
 

Similaire à Cassandra Summit 2010 Performance Tuning (13)

Le Top 10 des Best Practices pour SQL Server
Le Top 10 des Best Practices pour SQL ServerLe Top 10 des Best Practices pour SQL Server
Le Top 10 des Best Practices pour SQL Server
 
Amazon rds
Amazon rdsAmazon rds
Amazon rds
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central pa
 
Firebird and RAID
Firebird and RAIDFirebird and RAID
Firebird and RAID
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
Demystifying Storage - Building large SANs
Demystifying  Storage - Building large SANsDemystifying  Storage - Building large SANs
Demystifying Storage - Building large SANs
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQL
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_san
 
Lustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable StorageLustre+ZFS:Reliable/Scalable Storage
Lustre+ZFS:Reliable/Scalable Storage
 
MongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo BoulderMongoDB in the Cloud -- Mongo Boulder
MongoDB in the Cloud -- Mongo Boulder
 
Raid
Raid Raid
Raid
 

Dernier

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Cassandra Summit 2010 Performance Tuning

  • 1. Cassandra Summit 1.0 Performance Tuning Brandon Williams Riptano, Inc. brandon@riptano.com brandonwilliams@apache.org @faltering driftx on freenode August 10, 2010 Brandon Williams Cassandra Summit 1.0
  • 2. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Brandon Williams Cassandra Summit 1.0
  • 3. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Brandon Williams Cassandra Summit 1.0
  • 4. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) Brandon Williams Cassandra Summit 1.0
  • 5. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent Brandon Williams Cassandra Summit 1.0
  • 6. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent You could put the commitlog on the ephemeral drive anyway, at the price of durability But then, why have a commitlog at all? Brandon Williams Cassandra Summit 1.0
  • 7. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent You could put the commitlog on the ephemeral drive anyway, at the price of durability But then, why have a commitlog at all? Maybe you can disable it in 0.7/0.8 Brandon Williams Cassandra Summit 1.0
  • 8. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent You could put the commitlog on the ephemeral drive anyway, at the price of durability But then, why have a commitlog at all? Maybe you can disable it in 0.7/0.8 Realservers: one RAID array, bad RAID options Brandon Williams Cassandra Summit 1.0
  • 9. Tuning Writes Tuning Reads Making writes faster Use a separate IO device for the commit log. Hard to accomplish in the cloud Rackspace: one IO device, but it’s persistent (RAID array underneath) EC2: EBS is slow, local disk is impersistent You could put the commitlog on the ephemeral drive anyway, at the price of durability But then, why have a commitlog at all? Maybe you can disable it in 0.7/0.8 Realservers: one RAID array, bad RAID options Will anyone ever offer SSDs? Brandon Williams Cassandra Summit 1.0
  • 10. Tuning Writes Tuning Reads What else? concurrent writers (concurrent readers for reads) increase if you have lots of cores Brandon Williams Cassandra Summit 1.0
  • 11. Tuning Writes Tuning Reads What else? concurrent writers (concurrent readers for reads) increase if you have lots of cores memtable flush writers increase if you have lots of IO Brandon Williams Cassandra Summit 1.0
  • 12. Tuning Writes Tuning Reads What are all these options? memtable throughput in mb memtable operations in millions memtable flush after mins bigger memtables improve writes? Brandon Williams Cassandra Summit 1.0
  • 13. Tuning Writes Tuning Reads What are all these options? memtable throughput in mb memtable operations in millions memtable flush after mins bigger memtables improve writes? no, but they can improve reads Brandon Williams Cassandra Summit 1.0
  • 14. Tuning Writes Tuning Reads What are all these options? memtable throughput in mb memtable operations in millions memtable flush after mins bigger memtables improve writes? no, but they can improve reads what? Brandon Williams Cassandra Summit 1.0
  • 15. Tuning Writes Tuning Reads Compaction: the slayer of reads Brandon Williams Cassandra Summit 1.0
  • 16. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil Brandon Williams Cassandra Summit 1.0
  • 17. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell Brandon Williams Cassandra Summit 1.0
  • 18. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 Brandon Williams Cassandra Summit 1.0
  • 19. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes Brandon Williams Cassandra Summit 1.0
  • 20. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes reducing the priority affects CPU usage, not IO Brandon Williams Cassandra Summit 1.0
  • 21. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes reducing the priority affects CPU usage, not IO avoid reading from slow hosts Brandon Williams Cassandra Summit 1.0
  • 22. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes reducing the priority affects CPU usage, not IO avoid reading from slow hosts dynamic snitch Brandon Williams Cassandra Summit 1.0
  • 23. Tuning Writes Tuning Reads Compaction: the slayer of reads a necessary evil IO contention hell you can reduce compaction priority in 0.6.4 or later -Dcassandra.compaction.priority=1 constantly outstripping it means you need more nodes reducing the priority affects CPU usage, not IO avoid reading from slow hosts dynamic snitch accrual failure detector Brandon Williams Cassandra Summit 1.0
  • 24. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites Brandon Williams Cassandra Summit 1.0
  • 25. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites less sstables makes for more efficient compaction Brandon Williams Cassandra Summit 1.0
  • 26. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites less sstables makes for more efficient compaction if you are write once then read-only, you *could* turn it off Brandon Williams Cassandra Summit 1.0
  • 27. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites less sstables makes for more efficient compaction if you are write once then read-only, you *could* turn it off merge-on-read and bloomfilters save you Brandon Williams Cassandra Summit 1.0
  • 28. Tuning Writes Tuning Reads Compaction (con’t) bigger memtables absorb more overwrites less sstables makes for more efficient compaction if you are write once then read-only, you *could* turn it off merge-on-read and bloomfilters save you someday, you’ll want to repair Brandon Williams Cassandra Summit 1.0
  • 29. Tuning Writes Tuning Reads Know your read pattern Brandon Williams Cassandra Summit 1.0
  • 30. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? Brandon Williams Cassandra Summit 1.0
  • 31. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? disk is slow: you want that in memory Brandon Williams Cassandra Summit 1.0
  • 32. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? disk is slow: you want that in memory sometimes you can’t afford the cost Brandon Williams Cassandra Summit 1.0
  • 33. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? disk is slow: you want that in memory sometimes you can’t afford the cost how many reads are repeats? Brandon Williams Cassandra Summit 1.0
  • 34. Tuning Writes Tuning Reads Know your read pattern how much data is in the working set? disk is slow: you want that in memory sometimes you can’t afford the cost how many reads are repeats? doing lots of random IO within a row? column index size in kb Brandon Williams Cassandra Summit 1.0
  • 35. Tuning Writes Tuning Reads Caches Brandon Williams Cassandra Summit 1.0
  • 36. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks Brandon Williams Cassandra Summit 1.0
  • 37. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index Brandon Williams Cassandra Summit 1.0
  • 38. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this Brandon Williams Cassandra Summit 1.0
  • 39. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too Brandon Williams Cassandra Summit 1.0
  • 40. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards Brandon Williams Cassandra Summit 1.0
  • 41. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards make fat rows Brandon Williams Cassandra Summit 1.0
  • 42. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards make fat rows but not too fat, since the row is the unit of distribution Brandon Williams Cassandra Summit 1.0
  • 43. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards make fat rows but not too fat, since the row is the unit of distribution the OS file cache Brandon Williams Cassandra Summit 1.0
  • 44. Tuning Writes Tuning Reads Caches on a cold hit, each row requires two seeks one to find the row’s position in the index key cache eliminates this another to read the row row cache eliminates this, too columns in the row are contiguous afterwards make fat rows but not too fat, since the row is the unit of distribution the OS file cache use a good OS Brandon Williams Cassandra Summit 1.0
  • 45. Tuning Writes Tuning Reads Caching Strategies Brandon Williams Cassandra Summit 1.0
  • 46. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory Brandon Williams Cassandra Summit 1.0
  • 47. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure Brandon Williams Cassandra Summit 1.0
  • 48. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure trying to use 32GB of row cache will wreck you Brandon Williams Cassandra Summit 1.0
  • 49. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure trying to use 32GB of row cache will wreck you estimating the correct size can be difficult use the average row size in cfstats as a starting point in 0.7, each SSTable has a persistent row size histogram the penalty for being wrong can be catastrophic: OOM can’t be done programmatically in Java, or Cassandra would do it for you this is why you can’t set an absolute amount in bytes Brandon Williams Cassandra Summit 1.0
  • 50. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure trying to use 32GB of row cache will wreck you estimating the correct size can be difficult use the average row size in cfstats as a starting point in 0.7, each SSTable has a persistent row size histogram the penalty for being wrong can be catastrophic: OOM can’t be done programmatically in Java, or Cassandra would do it for you this is why you can’t set an absolute amount in bytes if you enable on it very fat rows, it can be bad Brandon Williams Cassandra Summit 1.0
  • 51. Tuning Writes Tuning Reads Caching Strategies key cache excellent bang for your buck half your seeks are gone a lot of keys fit in a relatively small amount of memory row cache all seeks are gone but more heap usage = more GC pressure trying to use 32GB of row cache will wreck you estimating the correct size can be difficult use the average row size in cfstats as a starting point in 0.7, each SSTable has a persistent row size histogram the penalty for being wrong can be catastrophic: OOM can’t be done programmatically in Java, or Cassandra would do it for you this is why you can’t set an absolute amount in bytes if you enable on it very fat rows, it can be bad keep your indexes in a different column family Brandon Williams Cassandra Summit 1.0
  • 52. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed Brandon Williams Cassandra Summit 1.0
  • 53. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap Brandon Williams Cassandra Summit 1.0
  • 54. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only Brandon Williams Cassandra Summit 1.0
  • 55. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only why do you have swap enabled, anyway? Brandon Williams Cassandra Summit 1.0
  • 56. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only why do you have swap enabled, anyway? Brandon Williams Cassandra Summit 1.0
  • 57. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only why do you have swap enabled, anyway? Absolute numbers vs percentages percentages can be an OOM time bomb harder to calculate how much memory the cache will use Brandon Williams Cassandra Summit 1.0
  • 58. Tuning Writes Tuning Reads Caching Strategies (con’t) OS file cache: it’s free no size estimation needed mmap is great unless it makes you swap switch to mmap index only why do you have swap enabled, anyway? Absolute numbers vs percentages percentages can be an OOM time bomb harder to calculate how much memory the cache will use Brandon Williams Cassandra Summit 1.0
  • 59. Tuning Writes Tuning Reads Caching Strategies (con’t) lookup order: row cache key cache disk (file cache?) Brandon Williams Cassandra Summit 1.0
  • 60. Tuning Writes Tuning Reads Caching Strategies (con’t) lookup order: row cache key cache disk (file cache?) sizing your caches: large key cache smaller row cache for very hot rows leave the rest to the OS Brandon Williams Cassandra Summit 1.0
  • 61. Tuning Writes Tuning Reads Caching Strategies (con’t) lookup order: row cache key cache disk (file cache?) sizing your caches: large key cache smaller row cache for very hot rows leave the rest to the OS don’t make your heap larger than needed Brandon Williams Cassandra Summit 1.0
  • 62. Tuning Writes Tuning Reads Caching Strategies (con’t) lookup order: row cache key cache disk (file cache?) sizing your caches: large key cache smaller row cache for very hot rows leave the rest to the OS don’t make your heap larger than needed monitor hit rates via JMX actually, monitor everything you can Brandon Williams Cassandra Summit 1.0
  • 63. Tuning Writes Tuning Reads Test, Measure, Tweak, Repeat Brandon Williams Cassandra Summit 1.0
  • 64. Tuning Writes Tuning Reads Test, Measure, Tweak, Repeat use stress.py as a baseline make sure you have multiprocessing Brandon Williams Cassandra Summit 1.0
  • 65. Tuning Writes Tuning Reads Test, Measure, Tweak, Repeat use stress.py as a baseline make sure you have multiprocessing move to real world data Brandon Williams Cassandra Summit 1.0
  • 66. Tuning Writes Tuning Reads Settings you don’t need to touch commitlog rotation threshold in mb SlicedBufferSizeInKB FlushIndexBufferSizeInMB Brandon Williams Cassandra Summit 1.0
  • 67. Tuning Writes Tuning Reads The End Questions? Brandon Williams Cassandra Summit 1.0