SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Shunsuke Nakamura
     / @sunsuk7tp
    Tokyo Institute of Technology
                   Master Course
                      Tokyo, Japan
Update latency             Read latency
         in write-heavy workload   in read-heavy workload

                                           write-
                                         optimized
Better




                  read-                                read-
                optimized                            optimized



                  write-
                optimized
performance      storage engine   distribution
  Apache HBase       write optimized Bigtable like     centralized
  Apache Cassandra   write optimized Bigtable like     decentralized
  Sharded MySQL      read optimized   MySQL            centralized
  Yahoo! Sherpa      read optimized   MySQL            centralized



The storage engine determines which workload a data store
  treats efficiently.
The distribution architecture of a data store is independent of
  the performance characteristics of read and write.

For example, if the storage part is excanged with MySQL, what
  does the characteristics of read and write change?
What is MyCassandra?
= Dynamo + Bigtable
= Dynamo + Bigtable
   distribution (P2P/decentralized)   storage engine
= Dynamo +
   distribution (P2P/decentralized)   storage engine
MySQL
= Dynamo +      Bigtable
                 Redis
                          :
         storage engine
MyCassandra is a modular distributed data store.
  You can select a storage engine by a keyspace.
      Index algorithm
          Read-optimized vs. write-optimized
          Sequential or Random
      Volatile or persistence
      Your experience for the storage engine
    MySQL (B+-Trees)
       read-optimized.

    Bigtable (LSM-Tree)
       write-optimized. Cassandra’s original
    Redis (hash)
       on-memory and asynchronous snapshot

    MongoDB (B-Tree)
       schema-less document oriented db

    KyotoCabinet (hash/B+-Tree)
       Simple Pluggable DBM (extended TokyoCabinet)
  You
    can adapt any data store to
 MyCassandra, a scalable data store.
  •  RDB (MySQL/PostgreSQL)

  You
     can apply to the apps which change I/O
 characteristics by a phase.
  •  MapReduce: Map – Shuffle - Reduce
  •  Full text search: crowl – indexing – search

  You   can apply to any IaaS environments.
  •  EC2 + RDS (MyCassandra with MySQL)
Max. QPS for 40 Clients           Bigtable
                                                MySQL
40000
                                                Redis
35000
30000
25000
20000
15000
10000
5000                                                       Better
   0

 (qps) Write Only   Write Heavy   Read Heavy   Read Only
select
proxy
  client
                                client
    •  o.a.c.cli
    •  o.a.c.avro/thrift                                      server
  proxy
    •  o.a.c.service.StorageProxy
  server                                                     engine
   •  o.a.c.service.StorageService
   •  o.a.c.db.ReadVerbHandler/RowMutationVerbHandler
  engine
   •  o.a.c.db.Table (by a keyspace)
        o.a.c.db.commitlog
        o.a.c.db.ColumnFamilyStore (by a columnfamily)
         o.a.c.db.engine.StorageEngineInterface
         o.a.c.db.engine.MySQLInstance, RedisInstance, MongoDBInstance, …
    Now supporting
      •  put (key, cf)
         Insert/Update/Delete   At least, you implement this two method.
     •  get (key)
     •  getRangeSlice (startWith, engWith, maxResults)
     •  truncate/dropTable/dropDB
    Next supporting
      •  secondaryIndex
      •  expire
      •  counter (Cassandra-0.8 ~)
  The    Data model is the same as Cassandra.
     •  But super column is not supported now.
 Store with the same Key/Value format as
 
 SSTable
     •  Supporting for a NoSQL of Any data model
 NoSQL with a data model of smaller
 
 dimension than Cassandra
     •  Add a prefix to a primary key
     •  The prefix means a Keyspace/ColumnFamily name.
Cassandra       MySQL      Redis

keyspace        database   db

column family   table      record

column          field
database                                                                   db
             table A                             table B                  key            values
key        values                     key      values
                                                                          A:sato         …
sato       gender;male;age;17         sato     visits;18;plan;Gold
                                                                          B:ito          …
suzuki     gender;female;age;         suzuki visits;
                                                                          A:suzuki       …
           21;region;Tokyo                   214;plan;Bronze
                                                                          B:tanaka       …
                           RDB (MySQL)
                                                                             KVS (Redis)
       keyspace
                       columnfamily A                        columnfamily B
        key col gender       age      region        key       col visits plan
        sato      male       17       [null]        sato             18           Gold
        suzuki    female     21       Tokyo         suzuki           214          Bronze

                                  Bigtable (Cassandra)
 A Key and a Value serialized a Object (now)
                # change easily
 A column is mapped to a MySQL’s field
     •  It gets smaller overhead but a schema is needed.
 Add       specialized column
     •  For secondary search
     •  For range query
      rowKey CF            counter       secondary     token
                                         index
      Primary Serialized   Specialized   For secondary For range
      key     object       column        search        search
      Key       Value
    A heterogeneous cluster
     •  It combines multiple types of nodes where
        different storage engines are located.
     •  Replicas of data are located each different
        storage engines.
     •  A proxy routes to nodes that efficiently process a
        query.
                         write query            read query

                     sync         async    async         sync


                     W             R        W                R
                   Bigtable       MySQL   Bigtable      MySQL
•  W: write-optimized
                                                            (e.g. Bigtable)
                                                            •  R: read-optimized
                                                            (e.g. MySQL)
                                                            •  RW: memory-based
                                                            (e.g. Redis)
  MyCassandra Cluster keeps the same consistency
 strength with Cassandra.
Quorum Protocol:        (write agrements) +   (read afreements) >      (replicas)
  •  This protocol guarantees to get one of the most recent value.

                Our system needs one node which synchronously process
 both read and write queries.
   Memory-based node (Redis)
                write query

            sync         async                 write           read


            W             R
                                               W       RW      R
          Bigtable       MySQL
•  W: write-optimized
                                                                (e.g. Bigtable)
                                                                •  R: read-optimized
 =3, =2                                                         (e.g. MySQL)
W:RW:R = 1:1:1                                                  •  RW: memory-based
                                       Client
                             Proxy                              (e.g. Redis)

                                                1)  A proxy broadcasts the query
                                                    to nodes.
     Wait for two acks for                      2)  The proxy waits
     write and return                           3a) write success: The proxy
                              Async write           returns a success msg. to client.
                                                3b) write failure: The proxy waits
       W                                            for acks from total
                  RW           R                4)                   the proxy
                                                      asynchronously waits for acks
 Nodes responsible for a record                       from the remaining
  Write Latency: max (W, RW)
•  W: write-optimized
                                                                   (e.g. Bigtable)
                                                                   •  R: read-optimized
 =3, =2                                                            (e.g. MySQL)
W:RW:R = 1:1:1                        Client                       •  RW: memory-based
                        Proxy                                      (e.g. Redis)

                                               1)  A proxy sends a request to a R or
         Async check                               RW node, a digest request to other
         consistency                               replicas.
                  Check consistency            2)  The proxy waits for     replies
                  and return result                including the specified record.
                                               3a) success: if the record and
                                                   digests are consistent, returns the
     W       RW         R                          record to the client.
                                               3b) failure or inconsistency: The proxy
                                                   tries to read and collect digests until
Nodes responsible for a record                     they satisfy the quorum
                                               4)  The proxy waits from the remaining
  Read Latency: max (R, RW)                              nodes after replying to the
                                                   client.
                                                   If there is inconsistent, resolve it
                                                   using Read Repair.
20000                                                 Cassandra
                   ×0.90      max. qps for 40 clients       MyCassandra Cluster
      18000
      16000                                                       × 6.53
      14000
      12000                                        × 1.54
                                    × 0.93
      10000
Better 8000
       6000
       4000
       2000
          0
                  [100:0]          [50:50]          [5:95]         [0:100] [write:read]
   (query/sec)   Write-Only      Write-Heavy     Read-Heavy       Read-Only

                    Write Heavy                        Read Heavy
                  •  YCSB / Zipfian
                  •  Throughput was up to 6.53 times as high as those of Cassandra.
                  •  In Write-Heavy, there happens multiple read repairs.
 MyCassandra-0.2.2
 •  secondaryIndex
      Apply to MySQL and MongoDB
 MyCassandra-0.3.0
 •  Based on Cassandra-0.8
 •  Atomic counter
 •  Brisk (Hadoop + Cassandra)…
1.    Asynchronous deletion
2.    Engine failure detection
3.    Support for ad hoc query
    Cassandra’s delete/expire operation
     •  Logical deletion using tombstone
     •  Actual deletion with SSTable compaction
        This approach depends on Bigtable’s engine.

  MyCassandra (MySQL, Redis, …)
     •  Synchronous Deletion (now)
     •  Expire function works well, but data continues to exit.
     •  Asynchronous deletion is a heavy operation
          I/O to a big table different from SSTable (It is a data subset.)
 Only with storage engine failure,
failure detection and the behavior of instance

  With  several storage engines and a partial
   failure, the behavior of instance
 instance            instance                     What should I
                                instance
         Periodic                                     do?
         polling                         detect
engine              engine      engine             instance overall failure?
                                                   Take over the other node?

                    node down
    Ad hoc query and data model
     •  If it does not depend on distributed archetecture, it can
      be added easily.
        Data model of Redis (List, Set, ..)
        Document data model and ad hoc queries of MongoDB
     •  But if it depends, it can not be supported.
          Atomic query across multiple keys.
          Join


     It is important to determine whether the query
     is dependent on the distributed mechanism.
 github
  •  https://github.com/sunsuk7tp/MyCassandra/
 Twitter
  •  @MyCassandraJP
  •  @_MyCassandra # @MyCassandra had already been taken!!
  •  @sunsuk7tp # my private account


 Google    Groups
  •  https://groups.google.com/group/my-cassandra
Thank you !

Contenu connexe

Tendances

Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentYahoo Developer Network
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandrazznate
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name NodeAaron Cordova
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersDataWorks Summit
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesHaohui Mai
 
Demystifying bigdata ashish1
Demystifying bigdata ashish1Demystifying bigdata ashish1
Demystifying bigdata ashish1Ashish singh
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon
 
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetupBDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetupDavid Lauzon
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performancevalerian_ceaus
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 

Tendances (20)

Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Demystifying bigdata ashish1
Demystifying bigdata ashish1Demystifying bigdata ashish1
Demystifying bigdata ashish1
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetupBDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performance
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 

En vedette

Hermanos grimm hansel y gretel
Hermanos grimm   hansel y gretelHermanos grimm   hansel y gretel
Hermanos grimm hansel y gretelKarii Hndz
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)Shun Nakamura
 
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)Shun Nakamura
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)Shun Nakamura
 

En vedette (8)

Green Computing
Green ComputingGreen Computing
Green Computing
 
Hermanos grimm hansel y gretel
Hermanos grimm   hansel y gretelHermanos grimm   hansel y gretel
Hermanos grimm hansel y gretel
 
ComSys WIP
ComSys WIPComSys WIP
ComSys WIP
 
Eza3 E
Eza3 EEza3 E
Eza3 E
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
 
07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
 

Similaire à MyCassandra (Full English Version)

第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandraShun Nakamura
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용Byeongweon Moon
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_finalSergioBruno21
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQLsunnygleason
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Column db dol
Column db dolColumn db dol
Column db dolpoojabi
 

Similaire à MyCassandra (Full English Version) (20)

第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
Cassandra
CassandraCassandra
Cassandra
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQL
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
Drop acid
Drop acidDrop acid
Drop acid
 
Column db dol
Column db dolColumn db dol
Column db dol
 

Dernier

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

MyCassandra (Full English Version)

  • 1. Shunsuke Nakamura / @sunsuk7tp Tokyo Institute of Technology Master Course Tokyo, Japan
  • 2. Update latency Read latency in write-heavy workload in read-heavy workload write- optimized Better read- read- optimized optimized write- optimized
  • 3. performance storage engine distribution Apache HBase write optimized Bigtable like centralized Apache Cassandra write optimized Bigtable like decentralized Sharded MySQL read optimized MySQL centralized Yahoo! Sherpa read optimized MySQL centralized The storage engine determines which workload a data store treats efficiently. The distribution architecture of a data store is independent of the performance characteristics of read and write. For example, if the storage part is excanged with MySQL, what does the characteristics of read and write change?
  • 5. = Dynamo + Bigtable
  • 6. = Dynamo + Bigtable distribution (P2P/decentralized) storage engine
  • 7. = Dynamo + distribution (P2P/decentralized) storage engine
  • 8. MySQL = Dynamo + Bigtable Redis : storage engine
  • 9. MyCassandra is a modular distributed data store.   You can select a storage engine by a keyspace.   Index algorithm   Read-optimized vs. write-optimized   Sequential or Random   Volatile or persistence   Your experience for the storage engine
  • 10.   MySQL (B+-Trees)   read-optimized.   Bigtable (LSM-Tree)   write-optimized. Cassandra’s original   Redis (hash)   on-memory and asynchronous snapshot   MongoDB (B-Tree)   schema-less document oriented db   KyotoCabinet (hash/B+-Tree)   Simple Pluggable DBM (extended TokyoCabinet)
  • 11.   You can adapt any data store to MyCassandra, a scalable data store. •  RDB (MySQL/PostgreSQL)   You can apply to the apps which change I/O characteristics by a phase. •  MapReduce: Map – Shuffle - Reduce •  Full text search: crowl – indexing – search   You can apply to any IaaS environments. •  EC2 + RDS (MyCassandra with MySQL)
  • 12. Max. QPS for 40 Clients Bigtable MySQL 40000 Redis 35000 30000 25000 20000 15000 10000 5000 Better 0 (qps) Write Only Write Heavy Read Heavy Read Only
  • 14. proxy   client client •  o.a.c.cli •  o.a.c.avro/thrift server   proxy •  o.a.c.service.StorageProxy   server engine •  o.a.c.service.StorageService •  o.a.c.db.ReadVerbHandler/RowMutationVerbHandler   engine •  o.a.c.db.Table (by a keyspace)   o.a.c.db.commitlog   o.a.c.db.ColumnFamilyStore (by a columnfamily)   o.a.c.db.engine.StorageEngineInterface   o.a.c.db.engine.MySQLInstance, RedisInstance, MongoDBInstance, …
  • 15.   Now supporting •  put (key, cf)   Insert/Update/Delete At least, you implement this two method. •  get (key) •  getRangeSlice (startWith, engWith, maxResults) •  truncate/dropTable/dropDB   Next supporting •  secondaryIndex •  expire •  counter (Cassandra-0.8 ~)
  • 16.   The Data model is the same as Cassandra. •  But super column is not supported now. Store with the same Key/Value format as   SSTable •  Supporting for a NoSQL of Any data model NoSQL with a data model of smaller   dimension than Cassandra •  Add a prefix to a primary key •  The prefix means a Keyspace/ColumnFamily name.
  • 17. Cassandra MySQL Redis keyspace database db column family table record column field
  • 18. database db table A table B key values key values key values A:sato … sato gender;male;age;17 sato visits;18;plan;Gold B:ito … suzuki gender;female;age; suzuki visits; A:suzuki … 21;region;Tokyo 214;plan;Bronze B:tanaka … RDB (MySQL) KVS (Redis) keyspace columnfamily A columnfamily B key col gender age region key col visits plan sato male 17 [null] sato 18 Gold suzuki female 21 Tokyo suzuki 214 Bronze Bigtable (Cassandra)
  • 19.  A Key and a Value serialized a Object (now) # change easily  A column is mapped to a MySQL’s field •  It gets smaller overhead but a schema is needed.  Add specialized column •  For secondary search •  For range query rowKey CF counter secondary token index Primary Serialized Specialized For secondary For range key object column search search Key Value
  • 20.   A heterogeneous cluster •  It combines multiple types of nodes where different storage engines are located. •  Replicas of data are located each different storage engines. •  A proxy routes to nodes that efficiently process a query. write query read query sync async async sync W R W R Bigtable MySQL Bigtable MySQL
  • 21. •  W: write-optimized (e.g. Bigtable) •  R: read-optimized (e.g. MySQL) •  RW: memory-based (e.g. Redis)   MyCassandra Cluster keeps the same consistency strength with Cassandra. Quorum Protocol: (write agrements) + (read afreements) > (replicas) •  This protocol guarantees to get one of the most recent value. Our system needs one node which synchronously process both read and write queries. Memory-based node (Redis) write query sync async write read W R W RW R Bigtable MySQL
  • 22. •  W: write-optimized (e.g. Bigtable) •  R: read-optimized =3, =2 (e.g. MySQL) W:RW:R = 1:1:1 •  RW: memory-based Client Proxy (e.g. Redis) 1)  A proxy broadcasts the query to nodes. Wait for two acks for 2)  The proxy waits write and return 3a) write success: The proxy Async write returns a success msg. to client. 3b) write failure: The proxy waits W for acks from total RW R 4) the proxy asynchronously waits for acks Nodes responsible for a record from the remaining Write Latency: max (W, RW)
  • 23. •  W: write-optimized (e.g. Bigtable) •  R: read-optimized =3, =2 (e.g. MySQL) W:RW:R = 1:1:1 Client •  RW: memory-based Proxy (e.g. Redis) 1)  A proxy sends a request to a R or Async check RW node, a digest request to other consistency replicas. Check consistency 2)  The proxy waits for replies and return result including the specified record. 3a) success: if the record and digests are consistent, returns the W RW R record to the client. 3b) failure or inconsistency: The proxy tries to read and collect digests until Nodes responsible for a record they satisfy the quorum 4)  The proxy waits from the remaining Read Latency: max (R, RW) nodes after replying to the client. If there is inconsistent, resolve it using Read Repair.
  • 24. 20000 Cassandra ×0.90 max. qps for 40 clients MyCassandra Cluster 18000 16000 × 6.53 14000 12000 × 1.54 × 0.93 10000 Better 8000 6000 4000 2000 0 [100:0] [50:50] [5:95] [0:100] [write:read] (query/sec) Write-Only Write-Heavy Read-Heavy Read-Only Write Heavy Read Heavy •  YCSB / Zipfian •  Throughput was up to 6.53 times as high as those of Cassandra. •  In Write-Heavy, there happens multiple read repairs.
  • 25.  MyCassandra-0.2.2 •  secondaryIndex   Apply to MySQL and MongoDB  MyCassandra-0.3.0 •  Based on Cassandra-0.8 •  Atomic counter •  Brisk (Hadoop + Cassandra)…
  • 26. 1.  Asynchronous deletion 2.  Engine failure detection 3.  Support for ad hoc query
  • 27.   Cassandra’s delete/expire operation •  Logical deletion using tombstone •  Actual deletion with SSTable compaction This approach depends on Bigtable’s engine.   MyCassandra (MySQL, Redis, …) •  Synchronous Deletion (now) •  Expire function works well, but data continues to exit. •  Asynchronous deletion is a heavy operation   I/O to a big table different from SSTable (It is a data subset.)
  • 28.  Only with storage engine failure, failure detection and the behavior of instance   With several storage engines and a partial failure, the behavior of instance instance instance What should I instance Periodic do? polling detect engine engine engine instance overall failure? Take over the other node? node down
  • 29.   Ad hoc query and data model •  If it does not depend on distributed archetecture, it can be added easily.   Data model of Redis (List, Set, ..)   Document data model and ad hoc queries of MongoDB •  But if it depends, it can not be supported.   Atomic query across multiple keys.   Join   It is important to determine whether the query is dependent on the distributed mechanism.
  • 30.  github •  https://github.com/sunsuk7tp/MyCassandra/  Twitter •  @MyCassandraJP •  @_MyCassandra # @MyCassandra had already been taken!! •  @sunsuk7tp # my private account  Google Groups •  https://groups.google.com/group/my-cassandra