SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
Rethinking Topology in Cassandra


                            ApacheCon North America
                                February 28, 2013



                                   Eric Evans
                               eevans@acunu.com
                                  @jericevans


Thursday, February 28, 13                             1
DHT 101



Thursday, February 28, 13             2
DHT 101
                             partitioning
                                Z   A




Thursday, February 28, 13                   3
DHT 101
                                    partitioning



                                Z                  A


                            Y                          B


                                                   C



Thursday, February 28, 13                                  4
DHT 101
                                    partitioning



                                Z                  A


                            Y       Key = Aaa          B


                                                   C



Thursday, February 28, 13                                  5
DHT 101
                                    replica placement



                                Z                       A


                            Y         Key = Aaa             B


                                                        C



Thursday, February 28, 13                                       6
DHT 101
                                       consistency




                        Consistency
                        Availability
                        Partition tolerance


Thursday, February 28, 13                            7
DHT 101
                            scenario: consistency level = one


                                 A
                                                                W

                                      ?



                                  ?



Thursday, February 28, 13                                           8
DHT 101
                            scenario: consistency level = all


                                 A
                                                                R

                                      ?



                                  ?



Thursday, February 28, 13                                           9
DHT 101
                              scenario: quorum write


                              A
                                                       W

                    R+W > N         B



                                ?



Thursday, February 28, 13                                  10
DHT 101
                              scenario: quorum read


                              ?



                    R+W > N        B


                                                      R
                               C



Thursday, February 28, 13                                 11
Awesome, yes?




Thursday, February 28, 13                   12
Well...




Thursday, February 28, 13             13
Problem:
                            Poor load distribution




Thursday, February 28, 13                            14
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        15
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        16
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        17
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        18
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        19
Distributing Load

                                 Z       A
                                         A1


                             Y                B


                                         C
                                     M

Thursday, February 28, 13                         20
Distributing Load

                                 Z       A
                                         A1


                             Y                B


                                         C
                                     M

Thursday, February 28, 13                         21
Distributing Load

                                 Z       A
                                         A1


                             Y                B


                                         C
                                     M

Thursday, February 28, 13                         22
Problem:
                            Poor data distribution




Thursday, February 28, 13                            23
Distributing Data
                                    A



                                          C
                             D




                                    B

Thursday, February 28, 13                       24
Distributing Data
                                     A
                                 E


                                          C
                             D




                                     B

Thursday, February 28, 13                       25
Distributing Data
                                      A   A
                                  E


                                              C
                             D
                                              C
                              D


                                      B B

Thursday, February 28, 13                         26
Distributing Data
                                      A   A
                                  E


                                              C
                             D
                                              C
                              D


                                      B B

Thursday, February 28, 13                         27
Distributing Data
                                     A
                                 H       E


                                             C
                             D


                                 G       F
                                     B

Thursday, February 28, 13                        28
Distributing Data
                                     A
                                 H       E


                                             C
                             D


                                 G       F
                                     B

Thursday, February 28, 13                        29
Virtual Nodes



Thursday, February 28, 13                   30
In a nutshell...

               host


                                               host


               host



Thursday, February 28, 13                             31
Benefits
                     • Operationally simpler (no token
                            management)
                     •      Better distribution of load
                     •      Concurrent streaming involving all hosts
                     •      Smaller partitions mean greater reliability
                     •      Supports heterogenous hardware


Thursday, February 28, 13                                                 32
Strategies

                     • Automatic sharding
                     • Fixed partition assignment
                     • Random token assignment



Thursday, February 28, 13                           33
Strategy
                                        Automatic Sharding



                     • Partitions are split when data exceeds a
                            threshold
                     • Newly created partitions are relocated to a
                            host with lower data load
                     • Similar to sharding performed by Bigtable,
                            or Mongo auto-sharding



Thursday, February 28, 13                                            34
Strategy
                                     Fixed Partition Assignment

                     • Namespace divided into Q evenly-sized
                            partitions
                     • Q/N partitions assigned per host (where N
                            is the number of hosts)
                     • Joining hosts “steal” partitions evenly from
                            existing hosts.
                     • Used by Dynamo and Voldemort (described
                            in Dynamo paper as “strategy 3”)


Thursday, February 28, 13                                             35
Strategy
                                    Random Token Assignment



                     • Each host assigned T random tokens
                     • T random tokens generated for joining
                            hosts; New tokens divide existing ranges
                     • Similar to libketama; Identical to Classic
                            Cassandra when T=1



Thursday, February 28, 13                                              36
Considerations

                     1. Number of partitions
                     2. Partition size
                     3. How 1 changes with more nodes and data
                     4. How 2 changes with more nodes and data




Thursday, February 28, 13                                        37
Evaluating
                            Strategy         No. Partitions   Partition size

                            Random                  O(N)         O(B/N)


                             Fixed                  O(1)          O(B)


                      Auto-sharding                 O(B)          O(1)

                B ~ total data size, N ~ number of hosts


Thursday, February 28, 13                                                      38
Evaluating
                     • Automatic sharding
                       • partition size constant (great)
                       • number of partitions scales linearly with
                            data size (bad)
                     • Fixed partition assignment
                     • Random token assignment

Thursday, February 28, 13                                            39
Evaluating
                     •      Automatic sharding
                     •      Fixed partition assignment
                            •   Number of partitions is constant (good)
                            •   Partition size scales linearly with data size
                                (bad)
                            •   Higher operational complexity (bad)
                     •      Random token assignment


Thursday, February 28, 13                                                       40
Evaluating
                     • Automatic sharding
                     • Fixed partition assignment
                     • Random token assignment
                       • Number of partitions scales linearly with
                              number of hosts (good ok)
                            • Partition size increases with more data;
                              decreases with more hosts (good)


Thursday, February 28, 13                                                41
Evaluating


                     • Automatic sharding
                     • Fixed partition assignment
                     • Random token assignment



Thursday, February 28, 13                           42
Cassandra



Thursday, February 28, 13               43
Configuration
                              conf/cassandra.yaml


                # Comma separated list of tokens,
                # (new installs only).
                initial_token:<token>,<token>,<token>

                or

                # Number of tokens to generate.
                num_tokens: 256



Thursday, February 28, 13                               44
Configuration
                                    nodetool info

       Token           :    (invoke with -T/--tokens to see all 256 tokens)
       ID              :    64090651-6034-41d5-bfc6-ddd24957f164
       Gossip active   :    true
       Thrift active   :    true
       Load            :    92.69 KB
       Generation No   :    1351030018
       Uptime (seconds):    45
       Heap Memory (MB):    95.16 / 1956.00
       Data Center     :    datacenter1
       Rack            :    rack1
       Exceptions      :    0
       Key Cache       :    size 240 (bytes), capacity 101711872 (bytes ...
       Row Cache       :    size 0 (bytes), capacity 0 (bytes), 0 hits, ...




Thursday, February 28, 13                                                     45
Configuration
                                              nodetool ring
       Datacenter: datacenter1
       ==========
       Replicas: 2

       Address              Rack    Status State    Load         Owns     Token
                                                                          9022770486425350384
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -9182469192098976078
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -9054823614314102214
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8970752544645156769
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8927190060345427739
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8880475677109843259
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8817876497520861779
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8810512134942064901
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8661764562509480261
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8641550925069186492
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8636224350654790732
       ...
       ...




Thursday, February 28, 13                                                                        46
Configuration
                                   nodetool status




       Datacenter: datacenter1
       =======================
       Status=Up/Down
       |/ State=Normal/Leaving/Joining/Moving
       -- Address   Load    Tokens Owns   Host ID                               Rack
       UN 10.0.0.1 97.2 KB 256     66.0% 64090651-6034-41d5-bfc6-ddd24957f164   rack1
       UN 10.0.0.2 92.7 KB 256     66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c   rack1
       UN 10.0.0.3 92.6 KB 256     67.7% e4eef159-cb77-4627-84c4-14efbc868082   rack1




Thursday, February 28, 13                                                               47
Configuration
                                   nodetool status




       Datacenter: datacenter1
       =======================
       Status=Up/Down
       |/ State=Normal/Leaving/Joining/Moving
       -- Address   Load    Tokens Owns   Host ID                               Rack
       UN 10.0.0.1 97.2 KB 256     66.0% 64090651-6034-41d5-bfc6-ddd24957f164   rack1
       UN 10.0.0.2 92.7 KB 256     66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c   rack1
       UN 10.0.0.3 92.6 KB 256     67.7% e4eef159-cb77-4627-84c4-14efbc868082   rack1




Thursday, February 28, 13                                                               48
Configuration
                                   nodetool status




       Datacenter: datacenter1
       =======================
       Status=Up/Down
       |/ State=Normal/Leaving/Joining/Moving
       -- Address   Load    Tokens Owns   Host ID                               Rack
       UN 10.0.0.1 97.2 KB 256     66.0% 64090651-6034-41d5-bfc6-ddd24957f164   rack1
       UN 10.0.0.2 92.7 KB 256     66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c   rack1
       UN 10.0.0.3 92.6 KB 256     67.7% e4eef159-cb77-4627-84c4-14efbc868082   rack1




Thursday, February 28, 13                                                               49
Migration
                                    A




                            C               B




Thursday, February 28, 13                       50
Migration
                            edit conf/cassandra.yaml and restart




                # Number of tokens to generate.
                num_tokens: 256




Thursday, February 28, 13                                          51
Migration
                            convert to T contiguous tokens in existing ranges

                                             A AA
                                        A AA               B




                                                               A
                                       A




                                                                AA
                                      A
                                     A




                                                                 AA A
                                    A
                                    A
                                    A




                                                                AAA AA
                                      A A
                                            A




                                                           C
                                           A
                                          A



                                         A
                                         A

                                        A
                                        A
                                        A
                                        A




Thursday, February 28, 13                                                       52
Migration
                                      shuffle

                                     A AA
                                A AA           B




                                                   A
                               A




                                                    AA
                              A
                             A




                                                    AA A
                            A
                            A
                            A




                                                    AAA AA
                             A A
                                   A




                                               C
                                  A
                                 A



                                A
                                A

                               A
                               A
                               A
                               A




Thursday, February 28, 13                                    53
Shuffle

                     • Range transfers are queued on each host
                     • Hosts initiate transfer of ranges to self
                     • Pay attention to the logs!


Thursday, February 28, 13                                          54
Shuffle
                                          bin/shuffle
       Usage: shuffle [options] <sub-command>

       Sub-commands:
        create              Initialize a new shuffle operation
        ls                  List pending relocations
        clear               Clear pending relocations
        en[able]            Enable shuffling
        dis[able]           Disable shuffling

       Options:
        -dc, --only-dc              Apply only to named DC (create only)
        -tp, --thrift-port          Thrift port number (Default: 9160)
        -p,   --port                JMX port number (Default: 7199)
        -tf, --thrift-framed        Enable framed transport for Thrift (Default: false)
        -en, --and-enable           Immediately enable shuffling (create only)
        -H,   --help                Print help information
        -h,   --host                JMX hostname or IP address (Default: localhost)
        -th, --thrift-host          Thrift hostname or IP address (Default: JMX host)




Thursday, February 28, 13                                                                 55
Performance



Thursday, February 28, 13                 56
removenode
                  400


                  300


                  200


                 100


                     0
                            Cassandra 1.2   Cassandra 1.1




Thursday, February 28, 13                                   57
bootstrap
                  500


                  375


                 250


                 125


                     0
                            Cassandra 1.2   Cassandra 1.1




Thursday, February 28, 13                                   58
The End
          • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan
               Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan
               Sivasubramanian, Peter Vosshall and Werner Vogels “Dynamo: Amazon’s
               Highly Available Key-value Store” Web.

          • Low, Richard. “Improving Cassandra's uptime with virtual nodes” Web.
          • Overton, Sam. “Virtual Nodes Strategies.” Web.
          • Overton, Sam. “Virtual Nodes: Performance Results.” Web.
          • Jones, Richard. "libketama - a consistent hashing algo for memcache
               clients” Web.



Thursday, February 28, 13                                                            59

Contenu connexe

En vedette

Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced CassandraEric Evans
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Eric Evans
 
Webinaire Business&Decision - Trifacta
Webinaire  Business&Decision - TrifactaWebinaire  Business&Decision - Trifacta
Webinaire Business&Decision - TrifactaVictor Coustenoble
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)Eric Evans
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architectureT Jake Luciani
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3Eric Evans
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraEric Evans
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraEric Evans
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In CassandraEric Evans
 
It's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDIt's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDEric Evans
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkVictor Coustenoble
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in CassandraEric Evans
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsEran Chinthaka Withana
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 

En vedette (20)

Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Webinar Degetel DataStax
Webinar Degetel DataStaxWebinar Degetel DataStax
Webinar Degetel DataStax
 
Webinaire Business&Decision - Trifacta
Webinaire  Business&Decision - TrifactaWebinaire  Business&Decision - Trifacta
Webinaire Business&Decision - Trifacta
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
DataStax Enterprise BBL
DataStax Enterprise BBLDataStax Enterprise BBL
DataStax Enterprise BBL
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In Cassandra
 
It's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDIt's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRD
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in Cassandra
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and Clouds
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
 

Plus de Eric Evans

Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLEric Evans
 
NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?Eric Evans
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraEric Evans
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
An Introduction To Cassandra
An Introduction To CassandraAn Introduction To Cassandra
An Introduction To CassandraEric Evans
 
Cassandra In A Nutshell
Cassandra In A NutshellCassandra In A Nutshell
Cassandra In A NutshellEric Evans
 

Plus de Eric Evans (9)

Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQL
 
NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
An Introduction To Cassandra
An Introduction To CassandraAn Introduction To Cassandra
An Introduction To Cassandra
 
Cassandra In A Nutshell
Cassandra In A NutshellCassandra In A Nutshell
Cassandra In A Nutshell
 

Dernier

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Rethinking Topology in Cassandra with Virtual Nodes

  • 1. Rethinking Topology in Cassandra ApacheCon North America February 28, 2013 Eric Evans eevans@acunu.com @jericevans Thursday, February 28, 13 1
  • 3. DHT 101 partitioning Z A Thursday, February 28, 13 3
  • 4. DHT 101 partitioning Z A Y B C Thursday, February 28, 13 4
  • 5. DHT 101 partitioning Z A Y Key = Aaa B C Thursday, February 28, 13 5
  • 6. DHT 101 replica placement Z A Y Key = Aaa B C Thursday, February 28, 13 6
  • 7. DHT 101 consistency Consistency Availability Partition tolerance Thursday, February 28, 13 7
  • 8. DHT 101 scenario: consistency level = one A W ? ? Thursday, February 28, 13 8
  • 9. DHT 101 scenario: consistency level = all A R ? ? Thursday, February 28, 13 9
  • 10. DHT 101 scenario: quorum write A W R+W > N B ? Thursday, February 28, 13 10
  • 11. DHT 101 scenario: quorum read ? R+W > N B R C Thursday, February 28, 13 11
  • 14. Problem: Poor load distribution Thursday, February 28, 13 14
  • 15. Distributing Load Z A Y B C M Thursday, February 28, 13 15
  • 16. Distributing Load Z A Y B C M Thursday, February 28, 13 16
  • 17. Distributing Load Z A Y B C M Thursday, February 28, 13 17
  • 18. Distributing Load Z A Y B C M Thursday, February 28, 13 18
  • 19. Distributing Load Z A Y B C M Thursday, February 28, 13 19
  • 20. Distributing Load Z A A1 Y B C M Thursday, February 28, 13 20
  • 21. Distributing Load Z A A1 Y B C M Thursday, February 28, 13 21
  • 22. Distributing Load Z A A1 Y B C M Thursday, February 28, 13 22
  • 23. Problem: Poor data distribution Thursday, February 28, 13 23
  • 24. Distributing Data A C D B Thursday, February 28, 13 24
  • 25. Distributing Data A E C D B Thursday, February 28, 13 25
  • 26. Distributing Data A A E C D C D B B Thursday, February 28, 13 26
  • 27. Distributing Data A A E C D C D B B Thursday, February 28, 13 27
  • 28. Distributing Data A H E C D G F B Thursday, February 28, 13 28
  • 29. Distributing Data A H E C D G F B Thursday, February 28, 13 29
  • 31. In a nutshell... host host host Thursday, February 28, 13 31
  • 32. Benefits • Operationally simpler (no token management) • Better distribution of load • Concurrent streaming involving all hosts • Smaller partitions mean greater reliability • Supports heterogenous hardware Thursday, February 28, 13 32
  • 33. Strategies • Automatic sharding • Fixed partition assignment • Random token assignment Thursday, February 28, 13 33
  • 34. Strategy Automatic Sharding • Partitions are split when data exceeds a threshold • Newly created partitions are relocated to a host with lower data load • Similar to sharding performed by Bigtable, or Mongo auto-sharding Thursday, February 28, 13 34
  • 35. Strategy Fixed Partition Assignment • Namespace divided into Q evenly-sized partitions • Q/N partitions assigned per host (where N is the number of hosts) • Joining hosts “steal” partitions evenly from existing hosts. • Used by Dynamo and Voldemort (described in Dynamo paper as “strategy 3”) Thursday, February 28, 13 35
  • 36. Strategy Random Token Assignment • Each host assigned T random tokens • T random tokens generated for joining hosts; New tokens divide existing ranges • Similar to libketama; Identical to Classic Cassandra when T=1 Thursday, February 28, 13 36
  • 37. Considerations 1. Number of partitions 2. Partition size 3. How 1 changes with more nodes and data 4. How 2 changes with more nodes and data Thursday, February 28, 13 37
  • 38. Evaluating Strategy No. Partitions Partition size Random O(N) O(B/N) Fixed O(1) O(B) Auto-sharding O(B) O(1) B ~ total data size, N ~ number of hosts Thursday, February 28, 13 38
  • 39. Evaluating • Automatic sharding • partition size constant (great) • number of partitions scales linearly with data size (bad) • Fixed partition assignment • Random token assignment Thursday, February 28, 13 39
  • 40. Evaluating • Automatic sharding • Fixed partition assignment • Number of partitions is constant (good) • Partition size scales linearly with data size (bad) • Higher operational complexity (bad) • Random token assignment Thursday, February 28, 13 40
  • 41. Evaluating • Automatic sharding • Fixed partition assignment • Random token assignment • Number of partitions scales linearly with number of hosts (good ok) • Partition size increases with more data; decreases with more hosts (good) Thursday, February 28, 13 41
  • 42. Evaluating • Automatic sharding • Fixed partition assignment • Random token assignment Thursday, February 28, 13 42
  • 44. Configuration conf/cassandra.yaml # Comma separated list of tokens, # (new installs only). initial_token:<token>,<token>,<token> or # Number of tokens to generate. num_tokens: 256 Thursday, February 28, 13 44
  • 45. Configuration nodetool info Token : (invoke with -T/--tokens to see all 256 tokens) ID : 64090651-6034-41d5-bfc6-ddd24957f164 Gossip active : true Thrift active : true Load : 92.69 KB Generation No : 1351030018 Uptime (seconds): 45 Heap Memory (MB): 95.16 / 1956.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 Key Cache : size 240 (bytes), capacity 101711872 (bytes ... Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, ... Thursday, February 28, 13 45
  • 46. Configuration nodetool ring Datacenter: datacenter1 ========== Replicas: 2 Address Rack Status State Load Owns Token 9022770486425350384 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9182469192098976078 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9054823614314102214 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8970752544645156769 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8927190060345427739 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8880475677109843259 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8817876497520861779 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8810512134942064901 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8661764562509480261 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8641550925069186492 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8636224350654790732 ... ... Thursday, February 28, 13 46
  • 47. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1 Thursday, February 28, 13 47
  • 48. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1 Thursday, February 28, 13 48
  • 49. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1 Thursday, February 28, 13 49
  • 50. Migration A C B Thursday, February 28, 13 50
  • 51. Migration edit conf/cassandra.yaml and restart # Number of tokens to generate. num_tokens: 256 Thursday, February 28, 13 51
  • 52. Migration convert to T contiguous tokens in existing ranges A AA A AA B A A AA A A AA A A A A AAA AA A A A C A A A A A A A A Thursday, February 28, 13 52
  • 53. Migration shuffle A AA A AA B A A AA A A AA A A A A AAA AA A A A C A A A A A A A A Thursday, February 28, 13 53
  • 54. Shuffle • Range transfers are queued on each host • Hosts initiate transfer of ranges to self • Pay attention to the logs! Thursday, February 28, 13 54
  • 55. Shuffle bin/shuffle Usage: shuffle [options] <sub-command> Sub-commands: create Initialize a new shuffle operation ls List pending relocations clear Clear pending relocations en[able] Enable shuffling dis[able] Disable shuffling Options: -dc, --only-dc Apply only to named DC (create only) -tp, --thrift-port Thrift port number (Default: 9160) -p, --port JMX port number (Default: 7199) -tf, --thrift-framed Enable framed transport for Thrift (Default: false) -en, --and-enable Immediately enable shuffling (create only) -H, --help Print help information -h, --host JMX hostname or IP address (Default: localhost) -th, --thrift-host Thrift hostname or IP address (Default: JMX host) Thursday, February 28, 13 55
  • 57. removenode 400 300 200 100 0 Cassandra 1.2 Cassandra 1.1 Thursday, February 28, 13 57
  • 58. bootstrap 500 375 250 125 0 Cassandra 1.2 Cassandra 1.1 Thursday, February 28, 13 58
  • 59. The End • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels “Dynamo: Amazon’s Highly Available Key-value Store” Web. • Low, Richard. “Improving Cassandra's uptime with virtual nodes” Web. • Overton, Sam. “Virtual Nodes Strategies.” Web. • Overton, Sam. “Virtual Nodes: Performance Results.” Web. • Jones, Richard. "libketama - a consistent hashing algo for memcache clients” Web. Thursday, February 28, 13 59