SlideShare une entreprise Scribd logo
1  sur  14
Cassandra
    Overview

         
What Is It?
    ●   It is a persistent database, but not an
        RDBMS – more on API later
    ●   It can run as a single instance or as a part
        of a cluster.
    ●   All nodes are equal, no master, no slaves
    ●   The cluster can be distributed within a
        single DC or across multiple DCs.
    ●   Multiple DCs can be Active-Active for
        performance or Active-Passive for DR
                              
Simple API
    ●   Get, Put, Delete – all by key
    ●   Batch put and delete – save wire time
    ●   Range queries (iterate over sequence of
        keys)
    ●   Target individual columns within a row –
        Get and Put
    ●   Native integration available for Hadoop
        MapReduce
    ●   CQL – SQL like language
                              
Consistent Hash Ring
    ●   Conceptually all nodes in a cluster are on
        a ring of hash values, “tokens”
    ●   Each node is assigned a token range on
        the ring
    ●   A key's hash (token) places it on the ring,
        within a specific node's token range
    ●   The hash is consistent, meaning the
        location of data is consistent and
        predictable
                              
0 => 2127 (Random 
    Partitoner)
    K1 => H1 (token)                 2127      0
    H1 => R4 (primary = N4)
    N = 3
                                                  N1
    RS = N4, N5, N6         N8       R1
                                                       R2
                           R8
                      N7
                                                                 N2


                      R7
                                                                 R3


                      N6                                         N3
                           R6                          R4

                                N5     R5         N4
                                                            H1


                                               
Replication
    ●   Replication Factor (N) determines how
        many replicas exist for each key
    ●   Location of replicas is determined by
        consistent hash ring and the “partitioner”
    ●   Generally, N=3 means data will be placed
        on node N, N+1, N+2 on the ring (This can
        vary based on placement strategy, but is
        predictable)
    ●   Powerful because no query required to
        find the node(s) containing a key
                              
Consistency
    ●   Consistency is “eventual” in Cassandra –
        it will always work to create N (Replication
        Factor) replicas
    ●   Write Consistency (W) defines how many
        replicas are guaranteed per “put” request
    ●   Read Consistency (R) defines how many
        replicas are consulted before responding
    ●   W and R are tunable per request,
        therefore consistency is tunable as well
                              
Data Modeling
      Example



           
Schema Overview
    ●   Keyspace (“database”) contains one or
        more ColumnFamilies
    ●   ColumnFamily (“table”) contains zero or
        more rows
    ●   A Row must contain one or more columns
    ●   ColumnFamilies are indexed by key
        (“rows”, but more like hash map)
    ●   Rows within the same CF may have
        different number of columns, and different
 
        column names!!        
Example
    UserData (Keyspace)
       UserAttributes (ColumnFamily, sort = UTF8)
                             Age         Sex        Weight
         Ellie
                             4           Female 32
                             Age         Sex
         Sammy
                             2           Male
                             Age         EyeColor    Height Sex
         Henry
                             2           Blue        30        Male

       UserAccessLog (ColumnFamily, sort = Long)
                             7/20/2010      7/22/2010
         Sammy

                             7/22/2010      7/23/2010        7/24/2010
         Henry

                                      
Columns
    ●   Column names (not values) are sorted,
        per key
    ●   32 bit limit to number of columns per key –
        entire column must fit in RAM, on one
        machine
    ●   Can retrieve/update/delete all columns,
        columns by name, or range of columns
    ●   A key (or row) must contain at least one
        Column, otherwise considered deleted
                             
Thrift Read Methods
    ●   get – return a single column for a single
        key
    ●   get_slice – return multiple columns for a
        single key
    ●   multiget_slice – return multiple columns
        for a list of keys
    ●   get_range_slices – return multiple
        columns for a “range” of keys
    ●   Most use “high level” client (Hector,
 
        Pycassa, etc)        
Thrift Write Methods
    ●   insert – insert/update a single column for a
        single key (most call this method, “put”)
    ●   batch_mutate – insert/update/remove
        multiple columns for multiple keys in
        multiple ColumnFamilies
    ●   remove – remove a single column (or
        entire row) for a single key



                             
Useful References
    ●   http://www.allthingsdistributed.com/2007/1
        0/amazons_dynamo.html
    ●   http://www.allthingsdistributed.com/2008/1
        2/eventually_consistent.html
    ●   http://wiki.apache.org/cassandra/
    ●   - "A description of the cassandra data
        model"
    ●   - "Architecture Overview"
    ●   - “Operations”
                             
    ●   - "Articles and Presentations"

Contenu connexe

Similaire à Cassandra Overview

Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache CassandraJacky Chu
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTWsunnygleason
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
A Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionA Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionTheo Hultberg
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databaseslovingprince58
 

Similaire à Cassandra Overview (20)

Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
A Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionA Guide to the Post Relational Revolution
A Guide to the Post Relational Revolution
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 

Dernier

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Cassandra Overview

  • 1. Cassandra Overview    
  • 2. What Is It? ● It is a persistent database, but not an RDBMS – more on API later ● It can run as a single instance or as a part of a cluster. ● All nodes are equal, no master, no slaves ● The cluster can be distributed within a single DC or across multiple DCs. ● Multiple DCs can be Active-Active for performance or Active-Passive for DR    
  • 3. Simple API ● Get, Put, Delete – all by key ● Batch put and delete – save wire time ● Range queries (iterate over sequence of keys) ● Target individual columns within a row – Get and Put ● Native integration available for Hadoop MapReduce ● CQL – SQL like language    
  • 4. Consistent Hash Ring ● Conceptually all nodes in a cluster are on a ring of hash values, “tokens” ● Each node is assigned a token range on the ring ● A key's hash (token) places it on the ring, within a specific node's token range ● The hash is consistent, meaning the location of data is consistent and predictable    
  • 5. 0 => 2127 (Random  Partitoner) K1 => H1 (token) 2127      0 H1 => R4 (primary = N4) N = 3 N1 RS = N4, N5, N6 N8 R1 R2 R8 N7 N2 R7 R3 N6 N3 R6 R4 N5 R5 N4 H1    
  • 6. Replication ● Replication Factor (N) determines how many replicas exist for each key ● Location of replicas is determined by consistent hash ring and the “partitioner” ● Generally, N=3 means data will be placed on node N, N+1, N+2 on the ring (This can vary based on placement strategy, but is predictable) ● Powerful because no query required to find the node(s) containing a key    
  • 7. Consistency ● Consistency is “eventual” in Cassandra – it will always work to create N (Replication Factor) replicas ● Write Consistency (W) defines how many replicas are guaranteed per “put” request ● Read Consistency (R) defines how many replicas are consulted before responding ● W and R are tunable per request, therefore consistency is tunable as well    
  • 8. Data Modeling Example    
  • 9. Schema Overview ● Keyspace (“database”) contains one or more ColumnFamilies ● ColumnFamily (“table”) contains zero or more rows ● A Row must contain one or more columns ● ColumnFamilies are indexed by key (“rows”, but more like hash map) ● Rows within the same CF may have different number of columns, and different   column names!!  
  • 10. Example UserData (Keyspace) UserAttributes (ColumnFamily, sort = UTF8) Age Sex Weight Ellie 4 Female 32 Age Sex Sammy 2 Male Age EyeColor Height Sex Henry 2 Blue 30 Male UserAccessLog (ColumnFamily, sort = Long) 7/20/2010 7/22/2010 Sammy 7/22/2010 7/23/2010 7/24/2010 Henry    
  • 11. Columns ● Column names (not values) are sorted, per key ● 32 bit limit to number of columns per key – entire column must fit in RAM, on one machine ● Can retrieve/update/delete all columns, columns by name, or range of columns ● A key (or row) must contain at least one Column, otherwise considered deleted    
  • 12. Thrift Read Methods ● get – return a single column for a single key ● get_slice – return multiple columns for a single key ● multiget_slice – return multiple columns for a list of keys ● get_range_slices – return multiple columns for a “range” of keys ● Most use “high level” client (Hector,   Pycassa, etc)  
  • 13. Thrift Write Methods ● insert – insert/update a single column for a single key (most call this method, “put”) ● batch_mutate – insert/update/remove multiple columns for multiple keys in multiple ColumnFamilies ● remove – remove a single column (or entire row) for a single key    
  • 14. Useful References ● http://www.allthingsdistributed.com/2007/1 0/amazons_dynamo.html ● http://www.allthingsdistributed.com/2008/1 2/eventually_consistent.html ● http://wiki.apache.org/cassandra/ ● - "A description of the cassandra data model" ● - "Architecture Overview" ● - “Operations”     ● - "Articles and Presentations"

Notes de l'éditeur

  1. CQL spec is at version 3 – but I believe is still a bit raw and untested. Not getting rid of thrift anytime soon