SlideShare une entreprise Scribd logo
1  sur  63
Cassandra concepts,
                                patterns and anti-
                                         patterns

                                                               Dave Gardner
                                                          @davegardnerisme
                                                         ApacheCon EU 2012
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Agenda

              • Choosing NoSQL
              • Cassandra concepts
                (Dynamo and Big Table)
              • Patterns and anti-patterns of
                use


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Choosing NoSQL...



Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
1. Find data store that doesn’t use
         SQL
      2. Anything
      3. Cram all the things into it
      4. Triumphantly blog this success
      5. Complain a month later when it
         bursts into flames


Cassandrahttp://www.slideshare.net/rbranson/how-do-i-cassandra/4
         concepts, patterns and anti-patterns - ApacheCon EU 2012
“NoSQL DBs trade off
        traditional features to better
        support new and emerging use
        cases”


            http://www.slideshare.net/argv0/riak-use-cases-dissecting-the-
            solutions-to-hard-problems
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
More widely used, tested and
            documented software..
            (MySQL first OS release 1998)



            .. for a relatively immature
            product
            (Cassandra first open-sourced in 2008)

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Ad-hoc querying..
            (SQL join, group by, having, order)



            .. for a rich data model with
            limited ad-hoc querying ability
            (Cassandra makes you denormalise)


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
What do we get in return?



Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Proven horizontal
            scalability

            Cassandra scales reads and
            writes linearly as new nodes
            are added
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
High availability

            Cassandra is fault-resistant
            with tunable consistency levels


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Low latency, solid
            performance

            Cassandra has very good write
            performance

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
* Add pinch of salt




                  http://blog.cubrid.org/dev-platform/nosql-benchmarking/
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Operational simplicity

            Homogenous cluster, no
            “master” node, no SPOF


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Rich data model

            Cassandra is more than simple
            key-value – columns,
            composites, counters,
            secondary indexes
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Choosing NoSQL...



Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
“they say … I can’t decide between
            this project and this project even
            though they look nothing like each
            other. And the fact that you can’t
            decide indicates that you don’t
            actually have a problem that
            requires them.”
            http://nosqltapes.com/video/benjamin-black-on-nosql-cloud-
            computing-and-fast_ip
            (at 30:15)
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Or you haven’t learned
         enough about them..


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
•       What tradeoffs are you making?
         •       How is it designed?
         •       What algorithms does it use?
         •       Are the fundamental design
                 decisions sane?


         http://www.alberton.info/nosql_databases_what_when_why_phpuk2
         011.html

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Concepts...



Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Amazon Dynamo + Google Big
                  Table
      Consistent hashing                                                  Columnar
      Vector clocks *                                                SSTable storage
      Gossip protocol                                                   Append-only
      Hinted handoff                                                      Memtable
      Read repair                                                       Compaction
      http://www.allthingsdistributed.com/fhttp://labs.google.com/papers/bi
      iles/amazon-dynamo-sosp2007.pdf                     gtable-osdi06.pdf

                                                                            * not in Cassandra
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
tokens are
                                                          1              integers from
                                                                            0 to 2127


                                  6                                  2




                                   5                                 3

        Distributed
        Hash Table                                       4
          (DHT)
      Clien
        t
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
1

                                  6                                  2


                                                     consistent
              Coordinator                            hashing
                node               5                                 3


      Clien                                              4
        t
      Clien
        t
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
1
                                                                          replication
                                                                         factor (RF) 3
                                  6                                  2



              coordinator
                 node              5                                 3


      Clien                                              4
        t
      Clien
        t
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Consistency Level (CL)

     How many replicas must respond to
            declare success?


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
For read operations

            Level                            Description
            ONE                              1st Response
            QUORUM                           N/2 + 1 replicas
            LOCAL_QUORUM                     N/2 + 1 replicas in local data centre
            EACH_QUORUM                      N/2 + 1 replicas in each data centre
            ALL                              All replicas




            http://wiki.apache.org/cassandra/API#Read

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
For write operations

            Level                           Description
            ANY                             One node, including hinted handoff
            ONE                             One node
            QUORUM                          N/2 + 1 replicas
            LOCAL_QUORUM                    N/2 + 1 replicas in local data centre
            EACH_QUORUM                     N/2 + 1 replicas in each data centre
            ALL                             All replicas


            http://wiki.apache.org/cassandra/API#Write

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
RF = 3
                                                          1          CL = Quorum


                                  6                                  2



              coordinator
                 node              5                                 3


      Clien                                              4
        t
      Clien
        t
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Hinted Handoff

         A hint is written to the coordinator
           node when a replica is down

            http://wiki.apache.org/cassandra/HintedHandoff

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
node offline                        RF = 3
                                                          1          CL = Quorum


                                  6                                  2



              coordinator
                 node              5                                 3


      Clien
                           hint                          4
        t
      Clien
        t
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Read Repair

     Background digest query on-read to
     find and update out-of-date replicas*

            http://wiki.apache.org/cassandra/ReadRepair
                                                 * carried out in the background unless CL:ALL
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
RF = 3
                                                          1              CL = One


                                  6                                  2



              coordinator
                 node              5                                 3

                                                                     background digest
      Clien                                              4           query, then update
        t
      Clien
                                                                     out-of-date replicas
        t
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Big Table...



Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
•       Sparse column based data model
         •       SSTable disk storage
         •       Append-only commit log
         •       Memtable (buffer and sort)
         •       Immutable SSTable files
         •       Compaction

         http://research.google.com/archive/bigtable-osdi06.pdf
         http://www.slideshare.net/geminimobile/bigtable-4820829

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
+ timestamp


                                          Name


                                          Value

                                        Column
                                                                      Timestamp used for
                                                                       conflict resolution
                                                                        (last write wins)
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
we can have millions
           of columns *


                                          Name                  Name              Name


                                          Value                 Value             Value

                                        Column                 Column            Column




                                                                        * theoretically up to 2 billion

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Row



                                          Name                  Name    Name
            Row Key
                                          Value                 Value   Value

                                        Column                 Column   Column




Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Column Family


            Row Key                     Column                 Column           Column



            Row Key                     Column                 Column           Column



            Row Key                     Column                 Column           Column



                                                                     we can have billions of rows



Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Write path                                                  buffer writes and
                                                                        sort data

               Write                                  Memtable


                                                              flush on time or
                                                                 size trigger          Memory

                                                                                            Disk
             Commit                                    SSTable          SSTable
              Log



                                                       SSTable          SSTable
                          immutable

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Sorted data written to disk in
                            blocks

              Each “query” can be answered
                from a single slice of disk

          Therefore start from your queries
                and work backwards
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Patterns and
                          anti-patterns...


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern



                   Storing entities as
                  individual columns
                    under one row

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern

                          one row per user



           row:                           USERID1234

           name:                          Dave
           email:                         dave@cruft.co
           job:                           Developer

                        we can use C* secondary indexes to
                        fetch all users with job=developer

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Anti-pattern




               Storing whole entity
                as single column
                       blob
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Anti-pattern

                          one row per user



           row:                           USERID1234

           data:     {"name":"Dave",
             "email":"dave@cruft.co",
             "job":"Developer"}

                        now we can’t use secondary indexes
                        nor easily update safely

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern



                  Mutate just the
                changes to entities,
                  make use of C*
                 conflict resolution
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern



           $userCf->insert(
               "USER1234",
               array("job" => "Cruft")
               );

                                  we only update the “job” column, avoiding any race
                                  conditions on reading all properties and then writing
                                  all, having only updated one

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Anti-pattern




                 Lock, read, update



Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern



                     Don’t overwrite
                   anything; store as
                    time series data

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern

                          one row per user; many columns (wide
                          row)

           row:                           USERID1234

           a384cff0-26c1-11e2-81c1-0800200c9a66
           {"action":"create", "name":"Dave"}
           10dc4c40-26c2-11e2-81c1-0800200c9a66
           {"action":"update", "name":"foo"}


                        column name is a type 1 UUID (time based)
                        http://www.famkruithof.net/guid-uuid-timebased.html

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern




                We can store all
              sorts of stuff as time
                     series
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Anti-pattern




                     Order Preserving
                     Paritioner (OPP)
           http://ria101.wordpress.com/2010/02/22/cassandra-
           randompartitioner-vs-orderpreservingpartitioner/

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern




              Distributed counters


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Anti-pattern




                        Super Columns
                                (a trap for the unwary)


           http://rubyscale.com/2010/beware-the-supercolumn-its-a-trap-
           for-the-unwary/

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
In conclusion...



Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Cassandra is founded on
   sound design principles


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
The data model is
                 incredibly powerful


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
CQL and a new breed
          of clients are making
             it easier to use

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Lots of tools and
          integrations exist to
         expand the feature set

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
There is a strong
      community and multiple
        companies offering
       professional support

Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Thanks
                                                                     looking for a job?


         Learn more about Cassandra (if you’re ever in London)
         meetup.com/Cassandra-London

         Learn more about the fundamentals
         http://nosqlsummer.org/

         Watch videos from Cassandra SF 2011
         http://www.datastax.com/events/cassandrasf2011/presentation
         s
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Extending
         functionality
         Search via Apache Solr and DataStax Enterprise
         http://www.datastax.com/technologies/solr

         Batch processing via Apache Hadoop and DataStax Enterprise
         http://www.datastax.com/technologies/hadoop

         Real-time analytics via Acunu Reflex
         http://www.acunu.com/acunu-analytics.html


Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Contenu connexe

Tendances

ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...Altinity Ltd
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용I Goo Lee
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsFrederic Descamps
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesAltinity Ltd
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 

Tendances (20)

ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
 
ClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic ContinuesClickHouse Materialized Views: The Magic Continues
ClickHouse Materialized Views: The Magic Continues
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Optimizing MySQL queries
Optimizing MySQL queriesOptimizing MySQL queries
Optimizing MySQL queries
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 

Similaire à Cassandra concepts, patterns and anti-patterns

Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
cassandra
cassandracassandra
cassandraAkash R
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
Apache cassandra lunch #82 instaclustr managed cassandra and next.js
Apache cassandra lunch #82  instaclustr managed cassandra and next.jsApache cassandra lunch #82  instaclustr managed cassandra and next.js
Apache cassandra lunch #82 instaclustr managed cassandra and next.jsAnant Corporation
 
Apache Cassandra Lunch #82: Instaclustr Managed Cassandra and Next.js
Apache Cassandra Lunch #82: Instaclustr Managed Cassandra and Next.jsApache Cassandra Lunch #82: Instaclustr Managed Cassandra and Next.js
Apache Cassandra Lunch #82: Instaclustr Managed Cassandra and Next.jsAnant Corporation
 
Latency and Consistency Tradeoffs in Modern Distributed Databases
Latency and Consistency Tradeoffs in Modern Distributed DatabasesLatency and Consistency Tradeoffs in Modern Distributed Databases
Latency and Consistency Tradeoffs in Modern Distributed DatabasesScyllaDB
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMIJCI JOURNAL
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!Edureka!
 
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...DataStax Academy
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraStratio
 
Analyzing_Data_with_Spark_and_Cassandra
Analyzing_Data_with_Spark_and_CassandraAnalyzing_Data_with_Spark_and_Cassandra
Analyzing_Data_with_Spark_and_CassandraRich Beaudoin
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 

Similaire à Cassandra concepts, patterns and anti-patterns (20)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra via-docker
Cassandra via-dockerCassandra via-docker
Cassandra via-docker
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
cassandra
cassandracassandra
cassandra
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
No sql
No sqlNo sql
No sql
 
Cassandra
CassandraCassandra
Cassandra
 
Apache cassandra lunch #82 instaclustr managed cassandra and next.js
Apache cassandra lunch #82  instaclustr managed cassandra and next.jsApache cassandra lunch #82  instaclustr managed cassandra and next.js
Apache cassandra lunch #82 instaclustr managed cassandra and next.js
 
Apache Cassandra Lunch #82: Instaclustr Managed Cassandra and Next.js
Apache Cassandra Lunch #82: Instaclustr Managed Cassandra and Next.jsApache Cassandra Lunch #82: Instaclustr Managed Cassandra and Next.js
Apache Cassandra Lunch #82: Instaclustr Managed Cassandra and Next.js
 
Latency and Consistency Tradeoffs in Modern Distributed Databases
Latency and Consistency Tradeoffs in Modern Distributed DatabasesLatency and Consistency Tradeoffs in Modern Distributed Databases
Latency and Consistency Tradeoffs in Modern Distributed Databases
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
 
Learn Cassandra at edureka!
Learn Cassandra at edureka!Learn Cassandra at edureka!
Learn Cassandra at edureka!
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Analyzing_Data_with_Spark_and_Cassandra
Analyzing_Data_with_Spark_and_CassandraAnalyzing_Data_with_Spark_and_Cassandra
Analyzing_Data_with_Spark_and_Cassandra
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 

Plus de Dave Gardner

Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Dave Gardner
 
Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoDave Gardner
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13Dave Gardner
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning CassandraDave Gardner
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner
 
Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Dave Gardner
 
2011.07.18 cassandrameetup
2011.07.18 cassandrameetup2011.07.18 cassandrameetup
2011.07.18 cassandrameetupDave Gardner
 
Cassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskCassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskDave Gardner
 
Introduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupIntroduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupDave Gardner
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Dave Gardner
 

Plus de Dave Gardner (13)

Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)
 
Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and Hailo
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
 
Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011
 
2011.07.18 cassandrameetup
2011.07.18 cassandrameetup2011.07.18 cassandrameetup
2011.07.18 cassandrameetup
 
Cassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskCassandra + Hadoop = Brisk
Cassandra + Hadoop = Brisk
 
Introduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupIntroduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web Meetup
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2
 
PHP and Cassandra
PHP and CassandraPHP and Cassandra
PHP and Cassandra
 

Cassandra concepts, patterns and anti-patterns

  • 1. Cassandra concepts, patterns and anti- patterns Dave Gardner @davegardnerisme ApacheCon EU 2012 Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 2. Agenda • Choosing NoSQL • Cassandra concepts (Dynamo and Big Table) • Patterns and anti-patterns of use Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 3. Choosing NoSQL... Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 4. 1. Find data store that doesn’t use SQL 2. Anything 3. Cram all the things into it 4. Triumphantly blog this success 5. Complain a month later when it bursts into flames Cassandrahttp://www.slideshare.net/rbranson/how-do-i-cassandra/4 concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 5. “NoSQL DBs trade off traditional features to better support new and emerging use cases” http://www.slideshare.net/argv0/riak-use-cases-dissecting-the- solutions-to-hard-problems Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 6. More widely used, tested and documented software.. (MySQL first OS release 1998) .. for a relatively immature product (Cassandra first open-sourced in 2008) Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 7. Ad-hoc querying.. (SQL join, group by, having, order) .. for a rich data model with limited ad-hoc querying ability (Cassandra makes you denormalise) Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 8. What do we get in return? Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 9. Proven horizontal scalability Cassandra scales reads and writes linearly as new nodes are added Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 11. High availability Cassandra is fault-resistant with tunable consistency levels Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 12. Low latency, solid performance Cassandra has very good write performance Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 13. * Add pinch of salt http://blog.cubrid.org/dev-platform/nosql-benchmarking/ Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 14. Operational simplicity Homogenous cluster, no “master” node, no SPOF Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 15. Rich data model Cassandra is more than simple key-value – columns, composites, counters, secondary indexes Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 16. Choosing NoSQL... Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 17. “they say … I can’t decide between this project and this project even though they look nothing like each other. And the fact that you can’t decide indicates that you don’t actually have a problem that requires them.” http://nosqltapes.com/video/benjamin-black-on-nosql-cloud- computing-and-fast_ip (at 30:15) Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 18. Or you haven’t learned enough about them.. Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 19. What tradeoffs are you making? • How is it designed? • What algorithms does it use? • Are the fundamental design decisions sane? http://www.alberton.info/nosql_databases_what_when_why_phpuk2 011.html Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 20. Concepts... Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 21. Amazon Dynamo + Google Big Table Consistent hashing Columnar Vector clocks * SSTable storage Gossip protocol Append-only Hinted handoff Memtable Read repair Compaction http://www.allthingsdistributed.com/fhttp://labs.google.com/papers/bi iles/amazon-dynamo-sosp2007.pdf gtable-osdi06.pdf * not in Cassandra Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 22. tokens are 1 integers from 0 to 2127 6 2 5 3 Distributed Hash Table 4 (DHT) Clien t Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 23. 1 6 2 consistent Coordinator hashing node 5 3 Clien 4 t Clien t Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 24. 1 replication factor (RF) 3 6 2 coordinator node 5 3 Clien 4 t Clien t Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 25. Consistency Level (CL) How many replicas must respond to declare success? Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 26. For read operations Level Description ONE 1st Response QUORUM N/2 + 1 replicas LOCAL_QUORUM N/2 + 1 replicas in local data centre EACH_QUORUM N/2 + 1 replicas in each data centre ALL All replicas http://wiki.apache.org/cassandra/API#Read Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 27. For write operations Level Description ANY One node, including hinted handoff ONE One node QUORUM N/2 + 1 replicas LOCAL_QUORUM N/2 + 1 replicas in local data centre EACH_QUORUM N/2 + 1 replicas in each data centre ALL All replicas http://wiki.apache.org/cassandra/API#Write Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 28. RF = 3 1 CL = Quorum 6 2 coordinator node 5 3 Clien 4 t Clien t Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 29. Hinted Handoff A hint is written to the coordinator node when a replica is down http://wiki.apache.org/cassandra/HintedHandoff Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 30. node offline RF = 3 1 CL = Quorum 6 2 coordinator node 5 3 Clien hint 4 t Clien t Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 31. Read Repair Background digest query on-read to find and update out-of-date replicas* http://wiki.apache.org/cassandra/ReadRepair * carried out in the background unless CL:ALL Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 32. RF = 3 1 CL = One 6 2 coordinator node 5 3 background digest Clien 4 query, then update t Clien out-of-date replicas t Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 33. Big Table... Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 34. Sparse column based data model • SSTable disk storage • Append-only commit log • Memtable (buffer and sort) • Immutable SSTable files • Compaction http://research.google.com/archive/bigtable-osdi06.pdf http://www.slideshare.net/geminimobile/bigtable-4820829 Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 35. + timestamp Name Value Column Timestamp used for conflict resolution (last write wins) Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 36. we can have millions of columns * Name Name Name Value Value Value Column Column Column * theoretically up to 2 billion Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 37. Row Name Name Name Row Key Value Value Value Column Column Column Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 38. Column Family Row Key Column Column Column Row Key Column Column Column Row Key Column Column Column we can have billions of rows Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 39. Write path buffer writes and sort data Write Memtable flush on time or size trigger Memory Disk Commit SSTable SSTable Log SSTable SSTable immutable Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 40. Sorted data written to disk in blocks Each “query” can be answered from a single slice of disk Therefore start from your queries and work backwards Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 41. Patterns and anti-patterns... Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 42. Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 43. Pattern Storing entities as individual columns under one row Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 44. Pattern one row per user row: USERID1234 name: Dave email: dave@cruft.co job: Developer we can use C* secondary indexes to fetch all users with job=developer Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 45. Anti-pattern Storing whole entity as single column blob Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 46. Anti-pattern one row per user row: USERID1234 data: {"name":"Dave", "email":"dave@cruft.co", "job":"Developer"} now we can’t use secondary indexes nor easily update safely Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 47. Pattern Mutate just the changes to entities, make use of C* conflict resolution Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 48. Pattern $userCf->insert( "USER1234", array("job" => "Cruft") ); we only update the “job” column, avoiding any race conditions on reading all properties and then writing all, having only updated one Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 49. Anti-pattern Lock, read, update Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 50. Pattern Don’t overwrite anything; store as time series data Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 51. Pattern one row per user; many columns (wide row) row: USERID1234 a384cff0-26c1-11e2-81c1-0800200c9a66 {"action":"create", "name":"Dave"} 10dc4c40-26c2-11e2-81c1-0800200c9a66 {"action":"update", "name":"foo"} column name is a type 1 UUID (time based) http://www.famkruithof.net/guid-uuid-timebased.html Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 52. Pattern We can store all sorts of stuff as time series Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 53. Anti-pattern Order Preserving Paritioner (OPP) http://ria101.wordpress.com/2010/02/22/cassandra- randompartitioner-vs-orderpreservingpartitioner/ Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 54. Pattern Distributed counters Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 55. Anti-pattern Super Columns (a trap for the unwary) http://rubyscale.com/2010/beware-the-supercolumn-its-a-trap- for-the-unwary/ Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 56. In conclusion... Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 57. Cassandra is founded on sound design principles Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 58. The data model is incredibly powerful Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 59. CQL and a new breed of clients are making it easier to use Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 60. Lots of tools and integrations exist to expand the feature set Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 61. There is a strong community and multiple companies offering professional support Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 62. Thanks looking for a job? Learn more about Cassandra (if you’re ever in London) meetup.com/Cassandra-London Learn more about the fundamentals http://nosqlsummer.org/ Watch videos from Cassandra SF 2011 http://www.datastax.com/events/cassandrasf2011/presentation s Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
  • 63. Extending functionality Search via Apache Solr and DataStax Enterprise http://www.datastax.com/technologies/solr Batch processing via Apache Hadoop and DataStax Enterprise http://www.datastax.com/technologies/hadoop Real-time analytics via Acunu Reflex http://www.acunu.com/acunu-analytics.html Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012

Notes de l'éditeur

  1. How should we go about choosing a NoSQL solution?This is the way that NoSQL is often approachedA light-hearted take on both how people approach NoSQL and to some extent the tools themselves
  2. A good way of choosing NoSQL is by considering the tradeoffs
  3. So we are going to be considering tradeoffs when we make our choice; but how do we know these tradeoffs are worth making?
  4. Ben Black suggests that a tell tale sign that you don’t _need_ NoSQL is when you cannot decide between projects
  5. I would argue that you probably cannot decide because you haven’t learned enough about the solutions and how they will fit your needs
  6. Learn about how you would model your application; learn about the key design decisions of the project itself; learn about the algorithms it uses.
  7. Cassandra is based on the distribution model of Amazon Dynamo with the data model of Google Big Table. We drop Vector Clocks from Dyanamo in favour of column-based “last write wins” conflict resolution from Big Table.
  8. DHT. The usual way of visualising is to draw a ring where we start at hash 0 and move round to our largest number (here 2^127).We pick “tokens” within this space (here we have 6 nodes and hence 6 tokens).
  9. When we write or read data, we calculate a hash of the row key to decide which node should be responsible for the data.We can connect to _any_ node in the cluster to issue our command; this will act as a coordinator node and store/get the data for us.
  10. Replicas are picked by traversing round the tokens.
  11. Here with RF=3 we need two of our 3 replicas to respond to consider the operation a success.When writing, all 3 replicas will still receive the write immediately; we just won’t _require_ them to respond to consider an operation successful (for CL < ALL).
  12. Pre 1.0, hints were written to a live replica. Now written to the coordinator. The hint tells the coordinator to update the unreachable replica when it comes back online.
  13. Reading at CL:ONE, we get our result from #1. There is then a % chance that we’ll query #2 and #3 for their value and then update any out-of-date replicas.
  14. I work for Hailo, these are patterns I find myself using frequently and anti-patterns I try to avoid.