SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
DaStor Evaluation Report
                     for CDR Storage & Query
making data alive!   (DaStor bases on Cassandra, a fault project)
                     Schubert Zhang
                     Big Data Engineering Team
                     Oct.28, 2010
Testbed

•   Hardware                                                           •   The existing testbed and
    – Cluster with 9 nodes                                                 configuration are not
         • 5 nodes
                                                                           ideal for performance.
                –   DELL PowerEdge R710
                –   CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz, cache
                    size=8192 KB
                –   Core: 2x 4core CPU, HyperThread, => 16 cores       •   Preferred
                –   RAM: 16GB
                                                                           – Commit Log: dedicated
                –   Hard Disk: 2x 1TB SATA 7.2k rpm, RAID0
                                                                             hard disk.
         • 4 nodes
                –   DELL PowerEdge 2970                                    – File system: XFS/EXT4.
                –   CPU: Quad-Core AMD Opteron (tm) Processor 2378,        – More memory to cache
                    cache size=512 KB
                –   Core: 2x 4core CPU, => 8 cores                           more indexes and
                –   RAM: 16GB                                                memadata.
                –   Hard Disk: 2x 1TB STAT 7.2k rpm, RAID0
    – Totally
         • 9 nodes, 112 cores, 144GB RAM, 18(18TB) Hard Disks
     – Network: within a single 1Gbps switch.
•   Linux: RedHat EL 5.3, Kernel=2.6.18-128.el5
•   File System: Ext3
•   JDK: Sun Java 1.6.0_20-b02

                                                                                                      2
DaStor Configuration

•   Release version: 1.6.6-001            Max Heap Size                  10GB

                                          Memtable Size                  1GB *
•   Memory Heap Quota: 10GB
                                          Index Interval                 128

•   CommitLog and Data storage use the    Key Cache Capacity             100000
    same 2TB volume(RAID0), as well as
                                          Replication Factor             2
    the Linux OS.
                                          CommitLog Segment Size         128MB
•   The important performance related     CommitLog Sync Period          10s
    parameters as the right side table.
                                          Concurrent Writers (Threads)   32

                                          Concurrent Readers (Threads)   16

                                          Cell Block Size                64KB

                                          Consistency Check              false

                                          Concurrent Compaction *        false

                                                                                  3
Data Schema for CDR

         Key               Date(Day) as Bucket            ……              Date(Day) as Bucket
                               (20101020)                                     (20101024)
        User ID        CDR   CDR    CDR     …      CDR    ……     CDR      CDR   CDR   CDR   …   CDR



                           sorted by timestamp                    cells
•   Schema
    –    Key: The User ID (Phone Number), string
    –    Bucket: The date(day) name, string
    –    Cell: CDR, Thrift (or ProtocolBuffer)           • Data Patterns
         compacted encoding
                                                            – A short set of temporal data
•   Semantics                                                 that tends to be volatile.
    –    Each user’s everyday CDRs are sorted by
         timestamp, and stored together.                    – An ever-growing set of data
                                                              that rarely gets accessed.
•   Stored Files
    –    The SSTable files are separated by Buckets.

•   Flexible and applicable to various CDR
    structures.

                                                                                                      4
Storage Architecture

 Key (bkt1 , bkt2 , bkt3)
                                         Memtable (bkt1)                        Triggered By:
                                                                                • Data size
  Commit Log                             Memtable (bkt2)                        • Lifetime
  Binary serialized
                                         Memtable (bkt3)               Flush
  Key (bkt1 , bkt2 , bkt3)


                                            Index file on disks          Data file on disks
                                                             <Size>   <Index>   <Serialized Cells>
                                K128 Offset                  ---
                                                             ---
                                K256 Offset           <Key> Offset
      Dedicated                                       <Key> Offset
        Disk                    K384 Offset           ---    ---

                                                      ---    ---
                                Bloom Filter of Key
                                                             ---
                             (Sparse Indexes in memory)

The storage architecture refers to relevant techniques of Google and other databases.
It’s similar to Bigtable, but it’s index scheme is different.
                                                                                                     5
Indexing Scheme

             Index Level-1
            Consistent Hash                      Index Level-3
                         1 0   h(key1)
                                                  Sorted Map,                                   BloomFilter
                     E                                                                                                       64KB (Changeable)
                               A                 mirror of data                               of Cells on Row
                                         N=3

             C                                        K0                 K0
 h(key2)                           F                                                                                Cells       Cells             Cells
                                                                                      Key         Cells Index
                                                                                                                   Block 0     Block 1
                                                                                                                                         ...
                                                                                                                                                 Block N
                               B
                 D

                         1/2
                                                                                               Index Level-4
                                                                                                Block Index
   Range of                                                                                       B-Tree
                                                     K128              K128
  Hash to Node
                                                                                              (Binary Search)
                                                                                            Cells Block 0 -> Position
       BloomFilter                                                                          Cells Block 1-> Position
   of Keys on SSTable                                                                                  ...
                                                     K256              K256              Cells Block N -> Position

     KeyCache

             Inde Level-2
              Block Index
                B-Tree
            (Binary Search)
                   K0
                                                     K384              K384
                                                                                     Totally 4 levels of indexing.
                 K128                                                                Indexes are relatively small.
                 K256
                 K384                                                                Very fit to store data of a
                                               Key Position Maps     Data Rows
                                                                                      individuals, such as users, etc.
           Sparse Block Index
           (Key interval = 128,
                                                   in Index file
                                               [on disk, cachable]
                                                                     in Data File    Good for CDR data serving.
                                                                      [on disk]
              changeable)
              [in memory]


                                                                                                                                                           6
Benchmark for Writes

• Each node runs 6 clients (threads), totally 54 clients.
• Each client generates random CDRs for 50 million users/phone-numbers,
  and puts them into DaStor one by one.
   – Key Space: 50 million
   – Size of a CDR: Thrift-compacted encoding, ~200 bytes



                                                            of one node

          of the cluster (9 nodes)




 Throughput: average ~80K ops/s; per-node: average ~9K ops/s
 Latency: average ~0.5ms
 Bottleneck: network (and memory)


                                                                          7
Benchmark for Writes (cluster overview)




               The wave is because of: (1) GC (2) Compaction




                                                               8
Benchmark for Reads

• Each node runs 8 clients (threads) , totally 72 clients.
• Each client randomly uses a user-id/phone-number out of the 50-million
  space, to get it’s recent 20 CDRs (one page) from DaStor.
• All clients read CDRs of a same day/bucket.

      ------------------------------------------------------------------------------------

• The 1st run:
   – Before compaction.
   – Average 8 SSTables on each node for everyday.


• The 2nd run:
   – After compaction.
   – Only one SSTable on each node for everyday.


                                                                                             9
Benchmark for Reads (before compaction)

                                                                                 of one node

              of the cluster (9 nodes)




       percentage of read ops
     25.00%

     20.00%

     15.00%

     10.00%

      5.00%

      0.00%
               1   3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
                                                     100ms



  Throughput: average ~140 ops/s; per-node: average ~16 ops/s
  Latency: average ~500ms, 97% < 2s (SLA)
  Bottleneck: disk IO (random seek) (CPU load is very low)
                                                                                                           10
Benchmark for Reads (after compaction)


                                                                                         of one node
             of the cluster (9 nodes)




         percentage of read ops
        100.00%
         80.00%
         60.00%
         40.00%
         20.00%
          0.00%
                  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
                                                           100ms


   Compaction of ~8 SSTables, ~200GB. Time:16core node: 1:40; 8core node: 2:25
   Throughput: average ~1.1K ops/s; per-node: average ~120 ops/s
   Latency: average ~60ms, 95% < 500ms (SLA)
   Bottleneck: disk IO (random seek) (CPU load is very low)
                                                                                                              11
Benchmark for (Writes + Reads)




                                 12
Experiences

• Large Memtable reduces the frequency of regular compaction.
   – We found that 1GB is fine.

• Large Key Space requires more memory, because of more Key-Indexes.
   – Especially for key-cache.
   – Memory Mapped index files.

• Compaction is sensitive to the number of CPU core, and L2/L3 cache.
   – On 16core node: A large (such as 200GB) compaction may take 100 minutes.
   – On 8core node : A large (such as 200GB) compaction may take 150 minutes.
   – Long compaction may result in many small SSTables, that reduce the read
     performance.
   – Now, we support concurrent compaction.

• Number of CPU Cores, L2/L3 cache, Disks, RAM size
   – CPU Cores, L2/L3 cache: Writes, Compaction
   – Disks: Random seeks and reads
   – RAM: Memtables for writes, Indexes cache for random reads.
                                                                                13
Maintenance Tools

•   Daily Flush Tool                          •   Daily Compaction Tool
    – To flush memtables of old buckets.           – To compact SSTables of old buckets.
    – Use dastor-admin tool.                       – Use dastor-admin tool.




              DaStor Admin Tool (bin/dastor-admin, bin/dastor-admin-shell)


                                                                                           14
Admin Web




            15
CDR Query Web Page for Demo




                              16
Developed Features

• Admin Tools                          • Concurrent Compaction
   – Configuration improvement based      – From single thread to bucket-
     on config-files                        independent multi-threads.
   – Script framework and scripts
                                       • Scalability
   – Admin tools                          – Easy to scale-out
   – CLI shell                            – More controllable
   – WebAdmin
   – Ganglia, Jmxetric                 • Benchmarks
                                          – Writes and Reads
                                          – Throughput and Latency
• Compression
   – New serialization format.         • Bug fix
   – Support Gzip and LZO.

• Bucket mapping and reclaim
   – Mapping plug-in
   – Reclaim command and mechanism.

• Java Client API

                                                                            17
Controllable Carrier-level Scale-out




 Existing cluster                                 New machines added into the cluster, not online.
 (1) Available Partitioning-A                     (1) Available Partitioning-A
 (2) Existing buckets with data                   (2) Existing buckets with data
                                                  (3) New Partitioning-B for future buckets, but not
                                                  available for now




 Time is gone, data in old buckets are reclaimed. The added machines online
 (1) gone                                         (1) Available Partitioning-A
 (2) gone                                         (2) Existing buckets with data
 (3) Only Partitioning-B, available for service   (3) New Partitioning-B available for service,
                                                  coexist with Partitioning-A. No data movement.
                                                                                                18
Data Processing and Analysis for BI & DM

                                 • Integration with MapReduce and
                                   Hive, etc.
          BI & DM Apps
                                 • Provide SQL-Like and rich API for BI
    QL
                                   and DM.
     Hive
                         API
   Table Meta                    • Built-in plug-ins for MapRecuce
                                   framework.
     MapReduce Framework

                                 • Flexible data structure description
  InputFormat     OutputFormat
                                   and tabular management.

            DaStor               • The simple and flexible data model
         (Data Storage)
                                   of DaStor is proper for analysis,
                                   since the past buckets are stable.
                                                                          19
Further Works

• More flexible and manageable Cache      • Deployment tools
    – Capability size/memory control.
                                             – Consider: Capistrano, PyStatus,
    – Methods to load and free cache.
                                               Puppet, Chef …
• Scalability feature for operational
  scale                                   • Data Analysis
    – Version 1.6.6 + Controllable
      Scalability                            – Hadoop

• Compression Improvement                 • Documents
    – To reduce number of disk seeks.
                                             – API Manual
• Admin Tools                                – Admin Manual
    – Configuration, Monitor, Control …
    – More professional and easy to use
                                          • Test
• Client API Enhancement                     – New features
    – Hide the individual node to be         – Performance
      connected.
    – More API methods.                      – mmap …

• Flexible consistency check
                                                                                 20
DaStor/Cassandra vs. Bigtable
•   Scalability: Bigtable has better scalability.
     –   The scale of DaStor should be controlled carefully, and may affect services. It is a big trouble.
     –   Bigtable’s scalability is easy.

•   Data Distribution: Bigtable’s high-level partitioning/indexing scheme is more fine-grained, and so more effective.
     –   DaStor ’s consistent hash partitioning scheme is too coarse-grained, and so we must cut up the bucket level partitions. But
         sometimes, it is not easy to trade-off on bigdata.

•   Indexing: Bigtable may need less memory to hold indexes.
     –   Bigtable ’s indexes are more general and can be shared equally (均摊) by different users/rows, especially when data-skew.
     –   There’s only one copy of indexes in Bigtable, even for multiple storage replications, since Bigtable use GFS layer for replication.
         (multiple copies of data, one copy of indexes)


•   Local Storage Engine: Bigtable provides better read performance, less disk seeks.
     –   Bigtable vs. Cassandra  InnoDB vs. MyISAM


•   Bigtable’s write/mutation performance is lower.
     –   Commit Log: If the GFS/HDFS support s fine-configuration to let individual directory on a exclusive disk, then …


• So, Bigtable ’s architecture and data model make more sense.
     – The Cassandra project is a fault. It is a big fault to mix Dynamo and Bigtable.
     – But in my opinion, Cassandra is just a partial Dynamo and target to a wrong
       field – Data Storage. It is anamorphotic.


                                                                                                                                               21

Contenu connexe

Tendances

Raid designs in Qsan Storage
Raid designs in Qsan StorageRaid designs in Qsan Storage
Raid designs in Qsan Storageqsantechnology
 
Apache con 2012 taking the guesswork out of your hadoop infrastructure
Apache con 2012 taking the guesswork out of your hadoop infrastructureApache con 2012 taking the guesswork out of your hadoop infrastructure
Apache con 2012 taking the guesswork out of your hadoop infrastructureSteve Watt
 
Hadoop on a personal supercomputer
Hadoop on a personal supercomputerHadoop on a personal supercomputer
Hadoop on a personal supercomputerPaul Dingman
 
Introduction to TrioNAS LX U300
Introduction to TrioNAS LX U300Introduction to TrioNAS LX U300
Introduction to TrioNAS LX U300qsantechnology
 
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)npinto
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
北航云计算公开课03 google file system
北航云计算公开课03 google file system北航云计算公开课03 google file system
北航云计算公开课03 google file systemCando Zhou
 
File server performance on the Intel processor-powered Dell PowerEdge R730xd ...
File server performance on the Intel processor-powered Dell PowerEdge R730xd ...File server performance on the Intel processor-powered Dell PowerEdge R730xd ...
File server performance on the Intel processor-powered Dell PowerEdge R730xd ...Principled Technologies
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstackJames Beal
 
Speed up your GIS server - run GIS software on solid-state drives (SSD)
Speed up your GIS server - run GIS software on solid-state drives (SSD)Speed up your GIS server - run GIS software on solid-state drives (SSD)
Speed up your GIS server - run GIS software on solid-state drives (SSD)Daniel Kastl
 
Faster and Smaller qcow2 Files with Subcluster-based Allocation
Faster and Smaller qcow2 Files with Subcluster-based AllocationFaster and Smaller qcow2 Files with Subcluster-based Allocation
Faster and Smaller qcow2 Files with Subcluster-based AllocationIgalia
 
Consolidating older database servers onto Dell PowerEdge FX2 with FC830 serve...
Consolidating older database servers onto Dell PowerEdge FX2 with FC830 serve...Consolidating older database servers onto Dell PowerEdge FX2 with FC830 serve...
Consolidating older database servers onto Dell PowerEdge FX2 with FC830 serve...Principled Technologies
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databasesAngelo Rajadurai
 
Cybertron pc slayer ii gaming pc (blue)
Cybertron pc slayer ii gaming pc (blue)Cybertron pc slayer ii gaming pc (blue)
Cybertron pc slayer ii gaming pc (blue)LilianaSuri
 

Tendances (20)

Raid designs in Qsan Storage
Raid designs in Qsan StorageRaid designs in Qsan Storage
Raid designs in Qsan Storage
 
Apache con 2012 taking the guesswork out of your hadoop infrastructure
Apache con 2012 taking the guesswork out of your hadoop infrastructureApache con 2012 taking the guesswork out of your hadoop infrastructure
Apache con 2012 taking the guesswork out of your hadoop infrastructure
 
Hadoop on a personal supercomputer
Hadoop on a personal supercomputerHadoop on a personal supercomputer
Hadoop on a personal supercomputer
 
Introduction to TrioNAS LX U300
Introduction to TrioNAS LX U300Introduction to TrioNAS LX U300
Introduction to TrioNAS LX U300
 
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Bluestore
BluestoreBluestore
Bluestore
 
北航云计算公开课03 google file system
北航云计算公开课03 google file system北航云计算公开课03 google file system
北航云计算公开课03 google file system
 
File server performance on the Intel processor-powered Dell PowerEdge R730xd ...
File server performance on the Intel processor-powered Dell PowerEdge R730xd ...File server performance on the Intel processor-powered Dell PowerEdge R730xd ...
File server performance on the Intel processor-powered Dell PowerEdge R730xd ...
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstack
 
Mateo valero p2
Mateo valero p2Mateo valero p2
Mateo valero p2
 
Speed up your GIS server - run GIS software on solid-state drives (SSD)
Speed up your GIS server - run GIS software on solid-state drives (SSD)Speed up your GIS server - run GIS software on solid-state drives (SSD)
Speed up your GIS server - run GIS software on solid-state drives (SSD)
 
Ron perrot
Ron perrotRon perrot
Ron perrot
 
Drbd
DrbdDrbd
Drbd
 
Mateo valero p1
Mateo valero p1Mateo valero p1
Mateo valero p1
 
Faster and Smaller qcow2 Files with Subcluster-based Allocation
Faster and Smaller qcow2 Files with Subcluster-based AllocationFaster and Smaller qcow2 Files with Subcluster-based Allocation
Faster and Smaller qcow2 Files with Subcluster-based Allocation
 
Consolidating older database servers onto Dell PowerEdge FX2 with FC830 serve...
Consolidating older database servers onto Dell PowerEdge FX2 with FC830 serve...Consolidating older database servers onto Dell PowerEdge FX2 with FC830 serve...
Consolidating older database servers onto Dell PowerEdge FX2 with FC830 serve...
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
Cybertron pc slayer ii gaming pc (blue)
Cybertron pc slayer ii gaming pc (blue)Cybertron pc slayer ii gaming pc (blue)
Cybertron pc slayer ii gaming pc (blue)
 
Upgrade & ndmp
Upgrade & ndmpUpgrade & ndmp
Upgrade & ndmp
 

Similaire à DaStor/Cassandra report for CDR solution

Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsIsaac Christoffersen
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxMemory Fabric Forum
 
Trend - HPC-29mai2012
Trend - HPC-29mai2012Trend - HPC-29mai2012
Trend - HPC-29mai2012Agora Group
 
Argonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer ArchitectureArgonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer Architectureinside-BigData.com
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Jeff Larkin
 
GNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for DatabasesGNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for DatabasesTanel Poder
 
Managing Exadata in the Real World
Managing Exadata in the Real WorldManaging Exadata in the Real World
Managing Exadata in the Real WorldEnkitec
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsLars Nielsen
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
 
Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaMatteo Baglini
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 

Similaire à DaStor/Cassandra report for CDR solution (20)

Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation Savings
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
 
Trend - HPC-29mai2012
Trend - HPC-29mai2012Trend - HPC-29mai2012
Trend - HPC-29mai2012
 
Argonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer ArchitectureArgonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer Architecture
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
 
Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012 Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012
 
GNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for DatabasesGNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for Databases
 
Managing Exadata in the Real World
Managing Exadata in the Real WorldManaging Exadata in the Real World
Managing Exadata in the Real World
 
Hard disk
Hard diskHard disk
Hard disk
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data Systems
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 
Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscana
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 

Plus de Schubert Zhang

Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and InfrastructureSchubert Zhang
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSchubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile DevelopmentSchubert Zhang
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Schubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aSchubert Zhang
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBaseSchubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on HadoopSchubert Zhang
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-streamSchubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南Schubert Zhang
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Schubert Zhang
 

Plus de Schubert Zhang (20)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

DaStor/Cassandra report for CDR solution

  • 1. DaStor Evaluation Report for CDR Storage & Query making data alive! (DaStor bases on Cassandra, a fault project) Schubert Zhang Big Data Engineering Team Oct.28, 2010
  • 2. Testbed • Hardware • The existing testbed and – Cluster with 9 nodes configuration are not • 5 nodes ideal for performance. – DELL PowerEdge R710 – CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz, cache size=8192 KB – Core: 2x 4core CPU, HyperThread, => 16 cores • Preferred – RAM: 16GB – Commit Log: dedicated – Hard Disk: 2x 1TB SATA 7.2k rpm, RAID0 hard disk. • 4 nodes – DELL PowerEdge 2970 – File system: XFS/EXT4. – CPU: Quad-Core AMD Opteron (tm) Processor 2378, – More memory to cache cache size=512 KB – Core: 2x 4core CPU, => 8 cores more indexes and – RAM: 16GB memadata. – Hard Disk: 2x 1TB STAT 7.2k rpm, RAID0 – Totally • 9 nodes, 112 cores, 144GB RAM, 18(18TB) Hard Disks – Network: within a single 1Gbps switch. • Linux: RedHat EL 5.3, Kernel=2.6.18-128.el5 • File System: Ext3 • JDK: Sun Java 1.6.0_20-b02 2
  • 3. DaStor Configuration • Release version: 1.6.6-001 Max Heap Size 10GB Memtable Size 1GB * • Memory Heap Quota: 10GB Index Interval 128 • CommitLog and Data storage use the Key Cache Capacity 100000 same 2TB volume(RAID0), as well as Replication Factor 2 the Linux OS. CommitLog Segment Size 128MB • The important performance related CommitLog Sync Period 10s parameters as the right side table. Concurrent Writers (Threads) 32 Concurrent Readers (Threads) 16 Cell Block Size 64KB Consistency Check false Concurrent Compaction * false 3
  • 4. Data Schema for CDR Key Date(Day) as Bucket …… Date(Day) as Bucket (20101020) (20101024) User ID CDR CDR CDR … CDR …… CDR CDR CDR CDR … CDR sorted by timestamp cells • Schema – Key: The User ID (Phone Number), string – Bucket: The date(day) name, string – Cell: CDR, Thrift (or ProtocolBuffer) • Data Patterns compacted encoding – A short set of temporal data • Semantics that tends to be volatile. – Each user’s everyday CDRs are sorted by timestamp, and stored together. – An ever-growing set of data that rarely gets accessed. • Stored Files – The SSTable files are separated by Buckets. • Flexible and applicable to various CDR structures. 4
  • 5. Storage Architecture Key (bkt1 , bkt2 , bkt3) Memtable (bkt1) Triggered By: • Data size Commit Log Memtable (bkt2) • Lifetime Binary serialized Memtable (bkt3) Flush Key (bkt1 , bkt2 , bkt3) Index file on disks Data file on disks <Size> <Index> <Serialized Cells> K128 Offset --- --- K256 Offset <Key> Offset Dedicated <Key> Offset Disk K384 Offset --- --- --- --- Bloom Filter of Key --- (Sparse Indexes in memory) The storage architecture refers to relevant techniques of Google and other databases. It’s similar to Bigtable, but it’s index scheme is different. 5
  • 6. Indexing Scheme Index Level-1 Consistent Hash Index Level-3 1 0 h(key1) Sorted Map, BloomFilter E 64KB (Changeable) A mirror of data of Cells on Row N=3 C K0 K0 h(key2) F Cells Cells Cells Key Cells Index Block 0 Block 1 ... Block N B D 1/2 Index Level-4 Block Index Range of B-Tree K128 K128 Hash to Node (Binary Search) Cells Block 0 -> Position BloomFilter Cells Block 1-> Position of Keys on SSTable ... K256 K256 Cells Block N -> Position KeyCache Inde Level-2 Block Index B-Tree (Binary Search) K0 K384 K384  Totally 4 levels of indexing. K128  Indexes are relatively small. K256 K384  Very fit to store data of a Key Position Maps Data Rows individuals, such as users, etc. Sparse Block Index (Key interval = 128, in Index file [on disk, cachable] in Data File  Good for CDR data serving. [on disk] changeable) [in memory] 6
  • 7. Benchmark for Writes • Each node runs 6 clients (threads), totally 54 clients. • Each client generates random CDRs for 50 million users/phone-numbers, and puts them into DaStor one by one. – Key Space: 50 million – Size of a CDR: Thrift-compacted encoding, ~200 bytes of one node of the cluster (9 nodes)  Throughput: average ~80K ops/s; per-node: average ~9K ops/s  Latency: average ~0.5ms  Bottleneck: network (and memory) 7
  • 8. Benchmark for Writes (cluster overview) The wave is because of: (1) GC (2) Compaction 8
  • 9. Benchmark for Reads • Each node runs 8 clients (threads) , totally 72 clients. • Each client randomly uses a user-id/phone-number out of the 50-million space, to get it’s recent 20 CDRs (one page) from DaStor. • All clients read CDRs of a same day/bucket. ------------------------------------------------------------------------------------ • The 1st run: – Before compaction. – Average 8 SSTables on each node for everyday. • The 2nd run: – After compaction. – Only one SSTable on each node for everyday. 9
  • 10. Benchmark for Reads (before compaction) of one node of the cluster (9 nodes) percentage of read ops 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 100ms  Throughput: average ~140 ops/s; per-node: average ~16 ops/s  Latency: average ~500ms, 97% < 2s (SLA)  Bottleneck: disk IO (random seek) (CPU load is very low) 10
  • 11. Benchmark for Reads (after compaction) of one node of the cluster (9 nodes) percentage of read ops 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 100ms  Compaction of ~8 SSTables, ~200GB. Time:16core node: 1:40; 8core node: 2:25  Throughput: average ~1.1K ops/s; per-node: average ~120 ops/s  Latency: average ~60ms, 95% < 500ms (SLA)  Bottleneck: disk IO (random seek) (CPU load is very low) 11
  • 12. Benchmark for (Writes + Reads) 12
  • 13. Experiences • Large Memtable reduces the frequency of regular compaction. – We found that 1GB is fine. • Large Key Space requires more memory, because of more Key-Indexes. – Especially for key-cache. – Memory Mapped index files. • Compaction is sensitive to the number of CPU core, and L2/L3 cache. – On 16core node: A large (such as 200GB) compaction may take 100 minutes. – On 8core node : A large (such as 200GB) compaction may take 150 minutes. – Long compaction may result in many small SSTables, that reduce the read performance. – Now, we support concurrent compaction. • Number of CPU Cores, L2/L3 cache, Disks, RAM size – CPU Cores, L2/L3 cache: Writes, Compaction – Disks: Random seeks and reads – RAM: Memtables for writes, Indexes cache for random reads. 13
  • 14. Maintenance Tools • Daily Flush Tool • Daily Compaction Tool – To flush memtables of old buckets. – To compact SSTables of old buckets. – Use dastor-admin tool. – Use dastor-admin tool. DaStor Admin Tool (bin/dastor-admin, bin/dastor-admin-shell) 14
  • 15. Admin Web 15
  • 16. CDR Query Web Page for Demo 16
  • 17. Developed Features • Admin Tools • Concurrent Compaction – Configuration improvement based – From single thread to bucket- on config-files independent multi-threads. – Script framework and scripts • Scalability – Admin tools – Easy to scale-out – CLI shell – More controllable – WebAdmin – Ganglia, Jmxetric • Benchmarks – Writes and Reads – Throughput and Latency • Compression – New serialization format. • Bug fix – Support Gzip and LZO. • Bucket mapping and reclaim – Mapping plug-in – Reclaim command and mechanism. • Java Client API 17
  • 18. Controllable Carrier-level Scale-out Existing cluster New machines added into the cluster, not online. (1) Available Partitioning-A (1) Available Partitioning-A (2) Existing buckets with data (2) Existing buckets with data (3) New Partitioning-B for future buckets, but not available for now Time is gone, data in old buckets are reclaimed. The added machines online (1) gone (1) Available Partitioning-A (2) gone (2) Existing buckets with data (3) Only Partitioning-B, available for service (3) New Partitioning-B available for service, coexist with Partitioning-A. No data movement. 18
  • 19. Data Processing and Analysis for BI & DM • Integration with MapReduce and Hive, etc. BI & DM Apps • Provide SQL-Like and rich API for BI QL and DM. Hive API Table Meta • Built-in plug-ins for MapRecuce framework. MapReduce Framework • Flexible data structure description InputFormat OutputFormat and tabular management. DaStor • The simple and flexible data model (Data Storage) of DaStor is proper for analysis, since the past buckets are stable. 19
  • 20. Further Works • More flexible and manageable Cache • Deployment tools – Capability size/memory control. – Consider: Capistrano, PyStatus, – Methods to load and free cache. Puppet, Chef … • Scalability feature for operational scale • Data Analysis – Version 1.6.6 + Controllable Scalability – Hadoop • Compression Improvement • Documents – To reduce number of disk seeks. – API Manual • Admin Tools – Admin Manual – Configuration, Monitor, Control … – More professional and easy to use • Test • Client API Enhancement – New features – Hide the individual node to be – Performance connected. – More API methods. – mmap … • Flexible consistency check 20
  • 21. DaStor/Cassandra vs. Bigtable • Scalability: Bigtable has better scalability. – The scale of DaStor should be controlled carefully, and may affect services. It is a big trouble. – Bigtable’s scalability is easy. • Data Distribution: Bigtable’s high-level partitioning/indexing scheme is more fine-grained, and so more effective. – DaStor ’s consistent hash partitioning scheme is too coarse-grained, and so we must cut up the bucket level partitions. But sometimes, it is not easy to trade-off on bigdata. • Indexing: Bigtable may need less memory to hold indexes. – Bigtable ’s indexes are more general and can be shared equally (均摊) by different users/rows, especially when data-skew. – There’s only one copy of indexes in Bigtable, even for multiple storage replications, since Bigtable use GFS layer for replication. (multiple copies of data, one copy of indexes) • Local Storage Engine: Bigtable provides better read performance, less disk seeks. – Bigtable vs. Cassandra  InnoDB vs. MyISAM • Bigtable’s write/mutation performance is lower. – Commit Log: If the GFS/HDFS support s fine-configuration to let individual directory on a exclusive disk, then … • So, Bigtable ’s architecture and data model make more sense. – The Cassandra project is a fault. It is a big fault to mix Dynamo and Bigtable. – But in my opinion, Cassandra is just a partial Dynamo and target to a wrong field – Data Storage. It is anamorphotic. 21