SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
(24)
•  @sunsuk7tp
•          /P.A. WORKS              /
•                CS M2
•          :
       : HPC
          TSUBAME
          MPI, Cell B.E., GPU CUDA, Hadoop on
              :
         
                          , P2P
                          NoSQL Afternoon in Japan (10.11.1,           )
          SACSIS 2011
•  Web                                    6
     PHP, Perl, JavaScript
    
          Apache Solr, MySQL
          NoSQL
                  NoSQL
•        Jazz,     trumpet
•  Cassandra 0.6.0
     @railute   @yutuki_r                    @techmemo
                                                    Itmedia
                                                      3
                                                    http://lab.jibun.atmarkit.co.jp/entries/1058
+

    NoSQL, Key-Value Store (KVS), Document-Oriented DB, GraphDB
       : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra,
     Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis,
     LevelDB, Hadoop HBase, Hypertable,Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM
     ObjectGrid, Oracle Coherence,       100
                                        :               ↔
      • 
      •  join, transaction
      •                      /
/DC




                                •  decentralized
                                • 

     •  master/slave
     •  data/meta/proxy
     • 
•    •  Map Reduce
• 
  SPOF
     DC
              dc1          dc2



               rack/dc
           region
                     dc3
 
       • 
   
       •  (   )                          <<                       &
       •                    , correlated failure
      SPOF = “                  ”
   
       •          : 1
       •                :            /
( ) Daniel Ford et. al. (Google), “Availability in Globally Distributed Storage Systems”, OSDI 2010
 


     ⇒               !!

         SPOF



                ~
 SPOF
         decentralized
     •    proxy/master/slave
 
Consistent Hashing (                                       )
    
(A~Z                )
           N := 3                      ID

           A                F
       Z                                        •  request proxy
                        secondary 1
                                                •          primary node
                         Q                      •             secondary node
          V                           N
       primary                    secondary 2
                           hash(key) = Q
                        key   values
MyCassandra
SQL                                     map


                    Megastore
                     library
   relational
  data model              table
                    (multi-dimentional
                      sorted map)


(sorted) records     (sorted) map
                                         (sorted) map
   + indices           + indices

    RDB            Bigtable                KVS
 NoSQL
PNUTS (VLDB ‘08): MySQL NoSQL   YCSB (SOCC ’10):
Write-Heavy       Read-Heavy



                                  write-
                                optimized
Better




                 read-                        read-
               optimized                    optimized



                 write-
               optimized
Apache HBase       write optimized   Bigtable like   centralized
     Apache Cassandra   write optimized   Bigtable like   decentralized
     Sharded MySQL      read optimized    MySQL           centralized
     Yahoo! Sherpa      read optimized    MySQL           centralized



       :


⇒              Cassandra                             MySQL
MyCassandra
= Dynamo + Bigtable
= Dynamo + Bigtable
      (P2P/decentralized)
= Dynamo +
     (P2P/decentralized)
                         RDBMS
            Table


               •      /
               • 
               • 




              NoSQL               !!
     query
MyCassandra
= Dynamo +
     (P2P/decentralized)
MySQL
= Dynamo +   Bigtable
              Redis
                :
1


    (master/worker, sharding,
       consistent hashing)




                     •  cache / persistence
                     •  index
                     •  write/read-optimized
                     • 
+




    MyCassandra
  InnoDB (MySQL 5.1~     )
  MyISAM
  Memory
  Merge
  Archive
  Federated
  NDB
  CSV
  Blackhole ( )
  FALCON
  MariaDB
  Drizzle              InnoDB/MyISAM
  solidDB
                        MySQL Cluster
  :
  MySQL:
  Bigtable:   Cassandra
  Redis:       /      snapshot
  MongoDB:                       DB




     
     
                                 decentralized
     • 
            RDB (MySQL / PostgreSQL)
     •  master/slave     decentralized
          MongoDB / Redis

 
     •  MapReduce
                 MySQL                                 Bigtable
          MySQL (InnoDB) INSERT
             Bigtable                     INSERT/GET
     • 
                 /              /

  EC2+RDS                 MyCassandra
            /
 I/O
     •  Bigtable (LSM-tree)
     •  MySQL (B-trees/ )
     •  Redis (Hash)
     •  MongoDB (B-tree)
     •  KyotoCabinet (B+ tree/hash)
hash           B-Trees          LSM-Tree
 write                  1   random I/O   append
 read                   1   random I/O   N    random I/O
                                         + merge
                cache
         Memcached,     MySQL,           Cassandra,
         Redis,         MongoDB,         HBase,
         KyotoCabinet   KyotoCabinet     LevelDB

 


 
+
                                : O(1)
                                     sequential write
                                I/O
      Always   writable
                                write-lock               memory
                                      sync               <k1, obj (v1+v2)> async flush
         write path                                 Memtable
     LSM-Tree [P. O’Neil ‘96]
                                                        disk
                                                    <k1, v1>, <k1, v2>
                                                   Commit Log
                                   sequential
         disk          mem                                                 <k1,obj1>
                                   write             SSTable 1
                                                                           <k1,obj2>
                                                     SSTable 2
                                                                           <k1,obj3>
                                                     SSTable 3
    SSTable
+

    Key
      •  Memtable           value
      •  SSTable                value
                                  I/O
     disk                                                      memory
                                                 <k1,obj>
                                                             Memtable

             disk               mem                             disk
                               <k1,obj+obj1~3>
                                                             Commit Log
                   client            merge
                                                 <k1,obj1>
                                                              SSTable 1
                               I/O               <k1,obj2>
                                                              SSTable 2
                                                 <k1,obj3>
                                                              SSTable 3
+
                                                        (     / 99.9%)

                                                  1/9
                                 Better
                                        read                             write
                                 avg.   6.16 ms
Number of queries




                                                                         read




                                                             Latency (ms)

                       write                                write: 2.0 ms
                    avg. 0.69 ms                            read: 86.9 ms
                                                            99.9 percentile
                               Latency (ms)
Max. QPS for 40 Clients           Bigtable
                                                MySQL
40000
                                                Redis
35000
30000
25000
20000
15000
10000
5000                                                       Better
   0

 (qps) Write Only   Write Heavy   Read Heavy   Read Only
           /            /
        /99%/Max/
 
 
                     ( KB~       MB)
    HDD/SSD
                  (zipfian, uniform, latest)
 
     •  Embedded InnoDB, KyotoCabinet

#                               ( )
select
proxy
  client
                                client
    •  o.a.c.cli
    •  o.a.c.avro/thrift                                      server
  proxy
    •  o.a.c.service.StorageProxy
  server                                                     engine
   •  o.a.c.service.StorageService
   •  o.a.c.db.ReadVerbHandler/RowMutationVerbHandler
  engine
   •  o.a.c.db.Table (keyspace       )
        o.a.c.db.commitlog
        o.a.c.db.ColumnFamilyStore (columnfamily       )
         o.a.c.db.engine.StorageEngineInterface
         o.a.c.db.engine.MySQLInstance, RedisInstance, MongoDBInstance, …
 
     •  put (key, cf)
                                               OK
     •  get (key)
     •  getRangeSlice (startWith, engWith, maxResults)
     •  truncate/dropTable/dropDB
 
     •  secondaryIndex
     •  expire
     •  counter (Cassandra-0.8    )
    Cassandra
     •         : keyspace – columnfamily – column
     •              key/value(             )
     • 
            ColumnFamily        SSTable <key, value>
            value: columnFamily
     Keyspace
                  ColumnFamily A                      ColumnFamily B
      key col gender     age      region     key       col visits plan
      sato      male     17       [null]     sato         18     Gold
      suzuki    female   21       Tokyo      suzuki       214    Bronze

                              Bigtable (Cassandra)
          Cassandra
     •  Super Column
 SSTable              key-value
     • 

                                  KVS
 key prefix
     • 
Cassandra       MySQL      Redis

keyspace        database   db

column family   table      record

column          field
database                                                                   db
             table A                             table B                  key            values
key        values                     key      values
                                                                          A:sato         …
sato       gender;male;age;17         sato     visits;18;plan;Gold
                                                                          B:ito          …
suzuki     gender;female;age;         suzuki visits;
                                                                          A:suzuki       …
           21;region;Tokyo                   214;plan;Bronze
                                                                          B:tanaka       …
                           RDB (MySQL)
                                                                             KVS (Redis)
       keyspace
                       columnfamily A                        columnfamily B
        key col gender       age      region        key       col visits plan
        sato      male       17       [null]        sato             18           Gold
        suzuki    female     21       Tokyo         suzuki           214          Bronze

                                  Bigtable (Cassandra)
 
      •  MySQL database = keyspace :=>
           MyCassandra (MySQL)
      •  MySQL table = keyspace :=>
           Cassandra               Bigtable (Cassandra)
keyspace
                 columnfamily A                           columnfamily B
  key col gender          age     region         key       col visits plan
  sato       male         17      [null]         sato            18    Gold
  suzuki     female       21      Tokyo          suzuki          214   Bronze

                                                                  MySQL
                           gender          age   region    visits      plan
                 sato      male            17    [null]    18          Gold
         Table
                 suzuki    female          21    Tokyo     214         Bronze
 


                                         1
 
 secondary index
     rowKey CF          counter   secondary   token
                                  index
           Serialized
           Object
     Key     Value

                         Key-Value KVS                …
 
     • 

     • 
     • 

                write query            read query

            sync         async    async         sync


            W             R        W                R
          Bigtable       MySQL   Bigtable      MySQL
•  W:
                                •  R:
                                •  RW:
 
 
                                    write query

                                sync              async


                                W                  R
 
Quorum Protocol:   (   )+   (          )>     (          )
     • 

                                write             read



                                W        RW       R
•  :
                                                               •  R:
                                                               •  RW:

 =3, =2
                                    Client
W:RW:R = 1:1:1              Proxy
                                             1) 


                                             2)    W, RW

                      ACK
                                                                        ACK

                                             3a)
      W          RW           R
                                             3b)           R

                                                                          ACK
                 : max (W, RW)
•  :
                                                             •  R:
                                                             •  RW:
 =3, =2
W:RW:R = 1:1:1                   Client
                     Proxy
                                          1) 


                                          2)    R, RW

                                          3a)
                                          3b)       or
                                                W
     W     RW         R
                                          4) 
                                                  .
                                                (Cassandra read repair   )
                 : max (R, RW)
20000                                              Cassandra
                    0.90      max. qps for 40 clients    MyCassandra Cluster
      18000
      16000                                                   6.49
      14000
      12000                                     1.54
                                     0.93
      10000
Better 8000
       6000
       4000
       2000
          0
                  [100:0]          [50:50]        [5:95]        [0:100] [write:read]
   (query/sec)   Write-Only      Write-Heavy   Read-Heavy      Read-Only

                    Write Heavy                    Read Heavy
                  • YCSB / Zipfian
                  •                                     6.49
                  • 
  https://github.com/sunsuk7tp/MyCassandra
  MyCassandra-0.2.0 (      )
     •  based on Cassandra-0.7.5
     •  Baseic CRUD on a simple record
     •  RangeSlice
     •  keyspace
1.         cassandra.yaml
      •       engine host, port, …
      •     default engine
2.                                         (           )
3.         MyCassandra               (Cassandra   )
4.                            or           keyspace,
           columnfamily
      •     engine              (keyspace   )
      •                   (column family  )
    Embedded InnoDB
     •  HailDB:                      …
     •  Handler Socket:                            …
     •  ExtraDB
     •  API
    DBM (KyotoCabinet)
     •  KyotoCassandra/Kyossandra/ ssandra (   )
     • 
     •  NoSQL
     •  QDBM, TC       Hash or B+Tree db
•            /
•  hash/B+tree
• 
class            persistence   algorithm        lock unit
ProtoHashDB      volatile      hash             whole (rwlock)
ProtoTreeDB                    red black tree   whole (rwlock)
StashDB                        hash             record (rwlock)
CacheDB                        hash             record (mutex)
GrassDB                        B+ tree          page (rwlock)
HashDB           persistent    hash             record (rwlock)
TreeDB                         B+ tree          page (rwlock)
DirDB                          undefined        record (rwlock)
ForestDB                       B+ tree          page (rwlock)
 MyCassandra-0.2.2
 •  secondaryIndex
      MySQL MongoDB
 MyCassandra-0.3.0
 •  Based on Cassandra-0.8
 •  Atomic counter
 •  Brisk (Hadoop + Cassandra)…
1. 
2. 
3. 
    Cassandra             /expire
     •  tombstone
     •                SSTable
     •  Bigtable like

 MyCassandra            Bigtable
  • 
  •  expire
  • 
       1  Table
 




 


 instance        instance   instance

         ping                        detect
engine          engine      engine            instance   ?
                                                             ?

                node down                                ?
 
     • 

            Redis
            MongoDB
           
     • 
                 key
            Join

 
 
     • 
 
     •  Cassandra-0.6         :
               GC
           
     •  Cassandra-0.7, 0.8:
         
         
         
         

                                 …
 Issue
  •  https://github.com/sunsuk7tp/MyCassandra/issues
 Twitter
  •  @MyCassandraJP
  •  @_MyCassandra # @MyCassandra                orz
  •  @sunsuk7tp #


 Google    Groups
  •  https://groups.google.com/group/my-cassandra
               / @railute
     •                       Cassandra
    Gemini Mobile Technologies / @geminimobile
     •              Hibari
               / @yutuki_r
     •  Cassandra               twitter
    dann / @techmemo
     •  Cassandra
               / @tatsuya6502
     •  YCSB         , Hibari
               / @mikio1978 / @fallabs
     •  KyotoCabinet
             / @muga_nishizawa
           / @Nakata_itpro
             / @shudo
    Cassandra
 
    UST                        (         )
第17回Cassandra勉強会: MyCassandra

Contenu connexe

Tendances

Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
Tim Lossen
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
Sergey Bushik
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
Tim Lossen
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
Christopher Choi
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
Ryu Kobayashi
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014
Hassan Islamov
 

Tendances (19)

Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Bluestore
BluestoreBluestore
Bluestore
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
NewSQL
NewSQLNewSQL
NewSQL
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
 
NewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPNewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTP
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
 
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
 
Intro to big data choco devday - 23-01-2014
Intro to big data   choco devday - 23-01-2014Intro to big data   choco devday - 23-01-2014
Intro to big data choco devday - 23-01-2014
 

Similaire à 第17回Cassandra勉強会: MyCassandra

What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
jbellis
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
Byeongweon Moon
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 

Similaire à 第17回Cassandra勉強会: MyCassandra (20)

MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
 
Drop acid
Drop acidDrop acid
Drop acid
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQL
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
MongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en NubeMongoDB, RabbitMQ y Applicaciones en Nube
MongoDB, RabbitMQ y Applicaciones en Nube
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Spark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksSpark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of Databricks
 
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 

Plus de Shun Nakamura

Plus de Shun Nakamura (8)

HBase at LINE
HBase at LINEHBase at LINE
HBase at LINE
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
 
シリコンバレーに行ってきた!
シリコンバレーに行ってきた!シリコンバレーに行ってきた!
シリコンバレーに行ってきた!
 
MyCassandra
MyCassandraMyCassandra
MyCassandra
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
 
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
 
Cassandra勉強会
Cassandra勉強会Cassandra勉強会
Cassandra勉強会
 
ComSys WIP
ComSys WIPComSys WIP
ComSys WIP
 

Dernier

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

第17回Cassandra勉強会: MyCassandra

  • 1.
  • 2. (24) •  @sunsuk7tp •  /P.A. WORKS / •  CS M2 •  :   : HPC   TSUBAME   MPI, Cell B.E., GPU CUDA, Hadoop on   :     , P2P   NoSQL Afternoon in Japan (10.11.1, )   SACSIS 2011 •  Web 6   PHP, Perl, JavaScript     Apache Solr, MySQL   NoSQL   NoSQL •  Jazz, trumpet •  Cassandra 0.6.0   @railute @yutuki_r @techmemo Itmedia 3 http://lab.jibun.atmarkit.co.jp/entries/1058
  • 3. +   NoSQL, Key-Value Store (KVS), Document-Oriented DB, GraphDB : memcached, Google Bigtable, Amazon Dynamo, Amazon SimpleDB, Apache Cassandra, Voldemort, Ringo, Vpork, MongoDB, CouchDB, Tokyo Tyrant, Flare, ROMA, kumofs, Kai, Redis, LevelDB, Hadoop HBase, Hypertable,Yahoo! PNUTS, Scalaris, Dynomite, ThruDB, Neo4j, IBM ObjectGrid, Oracle Coherence, 100   : ↔ •  •  join, transaction •  /
  • 4. /DC •  decentralized •  •  master/slave •  data/meta/proxy •  •  •  Map Reduce • 
  • 5.   SPOF   DC dc1 dc2 rack/dc region dc3
  • 6.   •    •  ( ) << & •  , correlated failure   SPOF = “ ”   •  : 1 •  : / ( ) Daniel Ford et. al. (Google), “Availability in Globally Distributed Storage Systems”, OSDI 2010
  • 7.   ⇒  !!   SPOF   ~ SPOF
  • 8.   decentralized •  proxy/master/slave  
  • 9. Consistent Hashing ( )   (A~Z ) N := 3 ID A F Z •  request proxy secondary 1 •  primary node Q •  secondary node V N primary secondary 2 hash(key) = Q key values
  • 11. SQL map Megastore library relational data model table (multi-dimentional sorted map) (sorted) records (sorted) map (sorted) map + indices + indices RDB Bigtable KVS NoSQL
  • 12. PNUTS (VLDB ‘08): MySQL NoSQL YCSB (SOCC ’10):
  • 13. Write-Heavy Read-Heavy write- optimized Better read- read- optimized optimized write- optimized
  • 14. Apache HBase write optimized Bigtable like centralized Apache Cassandra write optimized Bigtable like decentralized Sharded MySQL read optimized MySQL centralized Yahoo! Sherpa read optimized MySQL centralized : ⇒  Cassandra MySQL
  • 16. = Dynamo + Bigtable
  • 17. = Dynamo + Bigtable (P2P/decentralized)
  • 18. = Dynamo + (P2P/decentralized)
  • 19.   RDBMS   Table •  / •  •  NoSQL !! query
  • 21. = Dynamo + (P2P/decentralized)
  • 22. MySQL = Dynamo + Bigtable Redis :
  • 23.
  • 24. 1 (master/worker, sharding, consistent hashing) •  cache / persistence •  index •  write/read-optimized • 
  • 25. + MyCassandra
  • 26.   InnoDB (MySQL 5.1~ )   MyISAM   Memory   Merge   Archive   Federated   NDB   CSV   Blackhole ( )   FALCON   MariaDB   Drizzle InnoDB/MyISAM   solidDB MySQL Cluster   :
  • 27.   MySQL:   Bigtable: Cassandra   Redis: / snapshot   MongoDB: DB    
  • 28.   decentralized •    RDB (MySQL / PostgreSQL) •  master/slave decentralized   MongoDB / Redis   •  MapReduce   MySQL Bigtable   MySQL (InnoDB) INSERT   Bigtable INSERT/GET •    / /   EC2+RDS MyCassandra
  • 29.   /  I/O •  Bigtable (LSM-tree) •  MySQL (B-trees/ ) •  Redis (Hash) •  MongoDB (B-tree) •  KyotoCabinet (B+ tree/hash)
  • 30. hash B-Trees LSM-Tree write 1 random I/O append read 1 random I/O N random I/O + merge cache Memcached, MySQL, Cassandra, Redis, MongoDB, HBase, KyotoCabinet KyotoCabinet LevelDB    
  • 31. + : O(1)   sequential write I/O   Always writable write-lock memory sync <k1, obj (v1+v2)> async flush write path Memtable LSM-Tree [P. O’Neil ‘96] disk <k1, v1>, <k1, v2> Commit Log sequential disk mem <k1,obj1> write SSTable 1 <k1,obj2> SSTable 2 <k1,obj3> SSTable 3 SSTable
  • 32. +   Key •  Memtable value •  SSTable value I/O disk memory <k1,obj> Memtable disk mem disk <k1,obj+obj1~3> Commit Log client merge <k1,obj1> SSTable 1 I/O <k1,obj2> SSTable 2 <k1,obj3> SSTable 3
  • 33. + ( / 99.9%) 1/9 Better read write avg. 6.16 ms Number of queries read Latency (ms) write write: 2.0 ms avg. 0.69 ms read: 86.9 ms 99.9 percentile Latency (ms)
  • 34. Max. QPS for 40 Clients Bigtable MySQL 40000 Redis 35000 30000 25000 20000 15000 10000 5000 Better 0 (qps) Write Only Write Heavy Read Heavy Read Only
  • 35.   / /   /99%/Max/       ( KB~ MB)   HDD/SSD   (zipfian, uniform, latest)   •  Embedded InnoDB, KyotoCabinet # ( )
  • 37. proxy   client client •  o.a.c.cli •  o.a.c.avro/thrift server   proxy •  o.a.c.service.StorageProxy   server engine •  o.a.c.service.StorageService •  o.a.c.db.ReadVerbHandler/RowMutationVerbHandler   engine •  o.a.c.db.Table (keyspace )   o.a.c.db.commitlog   o.a.c.db.ColumnFamilyStore (columnfamily )   o.a.c.db.engine.StorageEngineInterface   o.a.c.db.engine.MySQLInstance, RedisInstance, MongoDBInstance, …
  • 38.   •  put (key, cf)   OK •  get (key) •  getRangeSlice (startWith, engWith, maxResults) •  truncate/dropTable/dropDB   •  secondaryIndex •  expire •  counter (Cassandra-0.8 )
  • 39.   Cassandra •  : keyspace – columnfamily – column •  key/value( ) •    ColumnFamily SSTable <key, value>   value: columnFamily Keyspace ColumnFamily A ColumnFamily B key col gender age region key col visits plan sato male 17 [null] sato 18 Gold suzuki female 21 Tokyo suzuki 214 Bronze Bigtable (Cassandra)
  • 40.   Cassandra •  Super Column  SSTable key-value •    KVS key prefix • 
  • 41. Cassandra MySQL Redis keyspace database db column family table record column field
  • 42. database db table A table B key values key values key values A:sato … sato gender;male;age;17 sato visits;18;plan;Gold B:ito … suzuki gender;female;age; suzuki visits; A:suzuki … 21;region;Tokyo 214;plan;Bronze B:tanaka … RDB (MySQL) KVS (Redis) keyspace columnfamily A columnfamily B key col gender age region key col visits plan sato male 17 [null] sato 18 Gold suzuki female 21 Tokyo suzuki 214 Bronze Bigtable (Cassandra)
  • 43.   •  MySQL database = keyspace :=>   MyCassandra (MySQL) •  MySQL table = keyspace :=>   Cassandra Bigtable (Cassandra) keyspace columnfamily A columnfamily B key col gender age region key col visits plan sato male 17 [null] sato 18 Gold suzuki female 21 Tokyo suzuki 214 Bronze MySQL gender age region visits plan sato male 17 [null] 18 Gold Table suzuki female 21 Tokyo 214 Bronze
  • 44.   1   secondary index rowKey CF counter secondary token index Serialized Object Key Value Key-Value KVS …
  • 45.   •  •  •  write query read query sync async async sync W R W R Bigtable MySQL Bigtable MySQL
  • 46. •  W: •  R: •  RW:     write query sync async W R   Quorum Protocol: ( )+ ( )> ( ) •  write read W RW R
  • 47. •  : •  R: •  RW: =3, =2 Client W:RW:R = 1:1:1 Proxy 1)  2)  W, RW ACK ACK 3a) W RW R 3b) R ACK : max (W, RW)
  • 48. •  : •  R: •  RW: =3, =2 W:RW:R = 1:1:1 Client Proxy 1)  2)  R, RW 3a) 3b) or W W RW R 4)  . (Cassandra read repair ) : max (R, RW)
  • 49. 20000 Cassandra 0.90 max. qps for 40 clients MyCassandra Cluster 18000 16000 6.49 14000 12000 1.54 0.93 10000 Better 8000 6000 4000 2000 0 [100:0] [50:50] [5:95] [0:100] [write:read] (query/sec) Write-Only Write-Heavy Read-Heavy Read-Only Write Heavy Read Heavy • YCSB / Zipfian •  6.49 • 
  • 50.   https://github.com/sunsuk7tp/MyCassandra   MyCassandra-0.2.0 ( ) •  based on Cassandra-0.7.5 •  Baseic CRUD on a simple record •  RangeSlice •  keyspace
  • 51. 1.  cassandra.yaml •  engine host, port, … •  default engine 2.  ( ) 3.  MyCassandra (Cassandra ) 4.  or keyspace, columnfamily •  engine (keyspace ) •  (column family )
  • 52.   Embedded InnoDB •  HailDB: … •  Handler Socket: … •  ExtraDB •  API   DBM (KyotoCabinet) •  KyotoCassandra/Kyossandra/ ssandra ( ) •  •  NoSQL •  QDBM, TC Hash or B+Tree db
  • 53. •  / •  hash/B+tree •  class persistence algorithm lock unit ProtoHashDB volatile hash whole (rwlock) ProtoTreeDB red black tree whole (rwlock) StashDB hash record (rwlock) CacheDB hash record (mutex) GrassDB B+ tree page (rwlock) HashDB persistent hash record (rwlock) TreeDB B+ tree page (rwlock) DirDB undefined record (rwlock) ForestDB B+ tree page (rwlock)
  • 54.  MyCassandra-0.2.2 •  secondaryIndex   MySQL MongoDB  MyCassandra-0.3.0 •  Based on Cassandra-0.8 •  Atomic counter •  Brisk (Hadoop + Cassandra)…
  • 56.   Cassandra /expire •  tombstone •  SSTable •  Bigtable like  MyCassandra Bigtable •  •  expire •    1 Table
  • 57.     instance instance instance ping detect engine engine engine instance ? ? node down ?
  • 58.   •    Redis   MongoDB   •    key   Join  
  • 59.   •    •  Cassandra-0.6 :   GC   •  Cassandra-0.7, 0.8:           …
  • 60.  Issue •  https://github.com/sunsuk7tp/MyCassandra/issues  Twitter •  @MyCassandraJP •  @_MyCassandra # @MyCassandra orz •  @sunsuk7tp #  Google Groups •  https://groups.google.com/group/my-cassandra
  • 61.   / @railute •  Cassandra   Gemini Mobile Technologies / @geminimobile •  Hibari   / @yutuki_r •  Cassandra twitter   dann / @techmemo •  Cassandra   / @tatsuya6502 •  YCSB , Hibari   / @mikio1978 / @fallabs •  KyotoCabinet   / @muga_nishizawa   / @Nakata_itpro   / @shudo   Cassandra     UST ( )