SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Memory-Based Cloud
Architectures



Jan Schaffner

Enterprise Platform and Integration Concepts Group
Definition




             Cloud Computing
                    =
             Data Center + API
What to take home from this talk?
Answers to four questions:


    !  Why are memory based architectures great for cloud computing?


    !  How predictable is the behavior of an in-memory column database?


    !  Does virtualization have a negative impact on in-memory databases?


    !  How do I assign tenants to servers in order to manage fault-tolerance and
       scalability?
First question




        Why are memory based architectures
            great for cloud computing?
Numbers everyone should know
"  L1 cache reference                    0.5 ns
"  Branch mispredict                     5 ns
"  L2 cache reference                    7 ns
"  Mutex lock/unlock                     25 ns
"  Main memory reference                 100 ns (in 2008)
"  Compress 1K bytes with Zippy          3,000 ns
"  Send 2K bytes over 1 Gbps network     20,000 ns
"  Read 1 MB sequentially from memory    250,000 ns
"  Round trip within same datacenter     500,000 ns (in 2008)
"  Disk seek                             10,000,000 ns
"  Read 1 MB sequentially from network   10,000,000 ns
"  Read 1 MB sequentially from disk      20,000,000 ns
"  Send packet CA ! Netherlands ! CA     150,000,000 ns
                                                            Source: Jeff Dean
Memory should be the
system of record

"  Typically disks have been the system of record
   !  Slow ! wrap them in complicated caching and distributed file
      systems to make them perform
   !  Memory used as cache all over the place but it can be
      invalidated when something changes on disk
"  Bandwidth:
   !  Disk:                   120 MB/s/controller
   !  DRAM (x86 + FSB):       10.4 GB/s/board
   !  DRAM (Nehalem):         25.6 GB/s/socket
"  Latency:
   !  Disk:           13 milliseconds (up to seconds when queuing)
   !  InfiniBand:     1-2 microseconds
   !  DRAM:           5 nanoseconds
High-end networks vs. disks


                        Maximum bandwidths:

       Hard Disk                 100-120 MB/s
       SSD                       250 MB/s
       Serial ATA II             600 MB/s
       10 GB Ethernet            1204 MB/s
       InfiniBand                1250 MB/s (4 channels)
       PCIe Flash Storage        1400 MB/s
       PCIe 3.0                  32 GB/s
       DDR3-1600                 25.6 GB/s (dual channel)
Designing a database for the cloud

"  Disks are the limiting factor in contemporary database systems
   !  Sharing a high performance disk on a machine/cluster/cloud is
      fine/troublesome/miserable
   !  While one guy is fetching 100 MB/s, everyone else is waiting


"  Claim: Two machines + network is better than one machine + disk
   !  Log to disk on a single node:
      > 10,000 !s     (not predictable)
   !  Transactions only in memory but on two nodes:
      < 600 !s        (more predictable)


"  Concept: Design to the strengths of cloud (redundancy) rather than
   their weaknesses (shared anything)
Design choices for a cloud database

"  No disks (in-memory delta tables + async snapshots)
"  Multi-master replication
    !  Two copies of the data
    !  Load balancing both reads and (monotonic) writes
    !  (Eventual) consistency achieved via MVCC (+ Paxos, later)
"  High-end hardware
    !  Nehalem for high memory bandwidth
    !  Fast interconnect
"  Virtualization
    !  Ease of deployment/administration
    !  Consolidation/multi-tenancy
Why consolidation?

"  In-memory column databases are ideal for mixed workload
   processing
"  But: In a SaaS environment it seems costly to give everybody
   their private NewDB box


"  How much consolidation is possible?
   !  3 years worth of sales records from our favorite
      Fortune 500 retail company
   !  360 million records
   !  Less than 3 GB in compressed columns in memory
   !  Next door is a machine with 1 TB of DRAM
   !  (Beware of overhead)
Multi-tenancy in the database –
four different options
"  No multi-tenancy – one VM per tenant
    !  Ex.: RightNow has 3000 tenants in 200 databases (2007):
       3000 vs. 200 Amazon VMs cost $2,628,000 vs. $175,200/year
    !  Very strong isolation


"  Shared machine – one database process per tenant
                                                                           T1
    !  Scheduler, session manager and transaction manager need
       live inside the individual DB processes: IPC for synchronization          T3
                                                                           T2
    !  Good for custom extensions, good isolation


"  Shared process – one schema instance per tenant
                                                                           T1
    !  Must support large numbers of tables                                      T3
                                                                           T2
    !  Must support online schema extension and evolution


"  Shared table – use a tenant_id column and partitioning
                                                                          T1, T2, T3
    !  Bad for custom extensions, bad isolation
    !  Hard to backup/restore/migrate individual tenants
Putting it all together:
Rock cluster architecture




                                      &))-./(012%   &))-./(012%
                     89!:%3;<*+7%
                                        3+,4+,%       3+,4+,%

 Extract data from
 external system          67)1,*+,%                       =-5<*+,%   Cluster membership,
                                           Rock                      Tenant placement
                                                          >(<*+,%
 Load balance
 between replicas                "15*+,%       "15*+,%

 Forward writes to
 other replicas         &'()*+,%         &'()*+,%        &'()*+,%


                         !"#$%             !"#$%          !"#$%
Second question




       How predictable is the behavior of an
          in-memory column database?
What does “predictable” mean?
"  Traditionally, database people are concerned with the questions of type
   “how do I make a query faster?”


"  In a SaaS environment, the question is
   “how do I get a fixed (low) response time as cheap as possible?”
    !  Look at throughput
    !  Look at quantiles (e.g. 99-th percentile)


"  Example formulation of desired performance:
    !  Response time goal “1 second in the 99-th percentile”
    !  Average response time around 200 ms
    !  Less than 1% of all queries exceed 1,000 ms
    !  Results in a maximum number of concurrent queries before
       response time goal is violated
each test contained several tenants of a particular tenant size so that
                                 20% of the available main memory has always been used for tenant data
                                 preloaded into main memory before each test run and a six minute warm
System capacity
                                 performed before each run. The test data was generated by the Star Schem
"  Fixed amount of data split equally among all tenants
                    Data Generator [36].
                 200
                                   Figure 4.2 shows the Measured
                                                        same graph using a logarithmic scale. This grap
                 180                               Approx. Function
                 160             linear which means that the relation can be described as
                 140                                             Can be expressed as:
  Requests / s




                 120                                         log(f (tSize )) = m · log(tSize ) + n
                 100
                  80             where n is the intercept with the y-axis f (0) and m is the gradient. T
                  60             n and the gradient m can be estimated using regression and the Least-S
                  40
                                 (see chapter 2.3.2) or simply by using a slope triangle. The gradient fo
                  20
                       20   40   60 ≈80 100 120 140(0) = n ≈ 3.6174113. This equation can, then, be
                                 m −0.945496 and f 160 180 200 220
                                        Tenant Size in MB
                                 equation (4.1).
"  Capacity " bytes scanned per second
   (there is a small overhead when processing more requests)
"  In-memory databases behave very linearly!
Workload
"  Tenants generally have different rates and sizes
"  For a given set of T tenants (on one server) define




"  When Workload = 1
    !  System runs at it’s maximum throughput level
    !  Further increase of workload will result in violation of
       response time goal
Response time
"  Different amounts of data and different request rates (“assorted mix”)
"  Workload is varied by scaling the request rates

                           2500
                                     Tenant Data 1.5 GB
                                     Tenant Data 2.0 GB
 99-th Perc. Value in ms




                           2000      Tenant Data 2.6 GB
                                     Tenant Data 3.2 GB
                           1500               Prediction


                           1000

                            500

                              0
                               0.2    0.3   0.4   0.5   0.6   0.7   0.8   0.9   1   1.1
                                                        Workload
Impact of writes
"  Added periodic batch writes (fact table grows by 0.5% every 5 minutes)


                             2000                  Without Writes
                                                       With Writes
   99-th Perc. Value in ms



                                         Prediction without Writes
                             1500           Prediction with Writes


                             1000


                              500


                                0
                                 0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1   1.1
                                                         Workload
Why is predictability good?
"  Ability to plan and perform resource intensive tasks during normal
   operations:
    !  Upgrades
    !  Merges
    !  Migrations of tenants in the cluster (e.g. to dynamically
       re-balance the load situation in the cluster)




                                                             Cost breakdown for
                                                             migration of tenants
Definition




             Cloud Computing
                    =
             Data Center + API
Third question




     Does virtualization have a negative impact
             on in-memory databases?
Impact of virtualization
                        "  Run multi-tenant OLAP benchmark on either:
                               !  one TREX instance directly on the physical host vs.
                               !  one TREX instance inside VM on the physical host


                        "  Overhead is approximately 7% (both in response time and throughput)

                        2600                                                                       5.5
                        2400                                                                        5
                        2200
Average response time




                                                                                                   4.5




                                                                              Queries per second
                        2000
                                                                                                    4
                        1800
                                                                                                   3.5
                        1600
                                                                                                    3
                        1400
                                                                                                   2.5
                        1200
                        1000                                                                        2
                         800                                                                       1.5
                                                virtual           physical                                               virtual          physical
                         600                                                                        1
                               0    2    4         6          8         10   12                          0   2   4          6         8         10   12
                                             Client Threads                                                          Client Threads
Impact of virtualization (contd.)
"  Virtualization is often used to get “better” system utilization
    !  What happens when a physical machine is split into multiple VMs?
    !  Burning CPU cycles does not hurt ! memory bandwidth is the
       limiting factor
                                             160 %
         Response Time as a Percentage of



                                                     Xeon E5450
          Response Time with 1 Active Slot



                                             150 %   Xeon X5650
                                             140 %
                                             130 %
                                             120 %
                                             110 %
                                             100 %
                                             90 %
                                             80 %
                                                      1               2               3          4
                                                                  Concurrently Active VM Slots
Fourth question




        How do I assign tenants to servers
        in order to manage fault-tolerance
                  and scalability?
Why is it good to have multiple
copies of the data?
"  Scalability beyond a certain number of concurrently active users
"  High availability during normal operations
"  Alternating execution of resource-intensive operations (e.g. merge)
"  Rolling upgrades without downtime
"  Data migration without downtime


"  Reminder: Two in-memory copies allow faster writes and are more
   predictable than one in-memory copy plus disk


"  Downsides:
    !  Response time goal might be violated during recovery
    !  You need to plan for twice the capacity
Tenant placement

         Conventional                        Interleaved
        Mirrored Layout                        Layout

       T1               T1             T1                T1
                                              T3               T5
       T2               T2             T2                T4



       T3               T3             T2                T4
                                              T6                T3
       T4               T4             T5                T6


 If a node fails, all work moves to   If a node fails, work moves to
one other node. The system must          many other nodes. Allows
    be 100% over-provisioned.          higher utilization of nodes.
Handcrafted best case
"  Perfect placement:
                                       1 1 4 4 7 7           1 4 7 1 2 3
    !  100 tenants
                                       2 2 5 5 8 8           2 5 8 4 5 6
    !  2 copies/tenant                 3 3 6 6 9 9           3 6 9 7 8 9
    !  All tenants have same size
                                           Mirrored           Interleaved
    !  10 tenants/server


!  Perfect balancing (same load on all tenants):
    !  6M rows (204 MB compressed) of data per tenant
    !  The same (increasing) number of users per tenant
    !  No writes
                              Mirrored        Interleaved    Improvement
      No failures            4218 users        4506 users         7%
    Periodic single          2265 users        4250 users        88%
       failures
                               Throughput before violating
                                   response time goal
Requirements for placement algorithm
"  An optimal placement algorithm needs to cope with
   multiple (conflicting) goals:
    !  Balance load across servers
    !  Achieve good interleaving


"  Use migrations consciously for online layout improvements
   (no big bang cluster re-organization)


"  Take usage patterns into account
    !  Request rates double during last week before end of quarter
    !  Time-zones, Christmas, etc.
Conclusion
"  Answers to four questions:


    !  Why are memory based architectures great for cloud computing?


    !  How predictable is the behavior of an in-memory column database?


    !  Does virtualization have a negative impact on in-memory databases?


    !  How do I assign tenants to servers in order to manage fault-tolerance and
       scalability?




"  Questions?

Contenu connexe

Tendances

Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
Google Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsGoogle Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsHarisankar H
 
Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014nkabra
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2Nick Wang
 
Wide area frequency easurement system iitb
Wide area frequency easurement system iitbWide area frequency easurement system iitb
Wide area frequency easurement system iitbPanditNitesh
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadeaviadea
 
Mysql story in poi dedup
Mysql story in poi dedupMysql story in poi dedup
Mysql story in poi dedupfeng lee
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)Nicholas Adu Gyamfi
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonDataStax Academy
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Inc.
 

Tendances (20)

Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Google Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsGoogle Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implications
 
Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
Spanner osdi2012
Spanner osdi2012Spanner osdi2012
Spanner osdi2012
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2
 
Wide area frequency easurement system iitb
Wide area frequency easurement system iitbWide area frequency easurement system iitb
Wide area frequency easurement system iitb
 
Cassandra勉強会
Cassandra勉強会Cassandra勉強会
Cassandra勉強会
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
Mysql story in poi dedup
Mysql story in poi dedupMysql story in poi dedup
Mysql story in poi dedup
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)My SQL Portal Database (Cluster)
My SQL Portal Database (Cluster)
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Drbd
DrbdDrbd
Drbd
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick Branson
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 
Spanner (may 19)
Spanner (may 19)Spanner (may 19)
Spanner (may 19)
 

En vedette

Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudQian Lin
 
مقدمة الى الأندرويد
مقدمة الى الأندرويدمقدمة الى الأندرويد
مقدمة الى الأندرويدAmal Wishah
 
الحوسبة السحابية
الحوسبة السحابيةالحوسبة السحابية
الحوسبة السحابيةMamoun Matar
 
File systems for mobile phones or handheld devices
File systems for mobile phones or handheld devices File systems for mobile phones or handheld devices
File systems for mobile phones or handheld devices Ram Kumar K R
 
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...Essam Obaid
 
حاسب ثالث ثانوي الفصل الأول كتاب الطالب النظام الفصلي
حاسب ثالث ثانوي الفصل الأول كتاب الطالب النظام الفصليحاسب ثالث ثانوي الفصل الأول كتاب الطالب النظام الفصلي
حاسب ثالث ثانوي الفصل الأول كتاب الطالب النظام الفصليabuhamad999
 
عرض عن الحوسبة السحابية
عرض عن الحوسبة السحابيةعرض عن الحوسبة السحابية
عرض عن الحوسبة السحابيةmousa5122
 
Mobile App Development- Project Management Process
Mobile App Development- Project Management ProcessMobile App Development- Project Management Process
Mobile App Development- Project Management ProcessBagaria Swati
 

En vedette (8)

Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud
 
مقدمة الى الأندرويد
مقدمة الى الأندرويدمقدمة الى الأندرويد
مقدمة الى الأندرويد
 
الحوسبة السحابية
الحوسبة السحابيةالحوسبة السحابية
الحوسبة السحابية
 
File systems for mobile phones or handheld devices
File systems for mobile phones or handheld devices File systems for mobile phones or handheld devices
File systems for mobile phones or handheld devices
 
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
Cloud computing دور الحوسبة السحابية فى المكتبات الرقمية ونظم الارشفة الالكتر...
 
حاسب ثالث ثانوي الفصل الأول كتاب الطالب النظام الفصلي
حاسب ثالث ثانوي الفصل الأول كتاب الطالب النظام الفصليحاسب ثالث ثانوي الفصل الأول كتاب الطالب النظام الفصلي
حاسب ثالث ثانوي الفصل الأول كتاب الطالب النظام الفصلي
 
عرض عن الحوسبة السحابية
عرض عن الحوسبة السحابيةعرض عن الحوسبة السحابية
عرض عن الحوسبة السحابية
 
Mobile App Development- Project Management Process
Mobile App Development- Project Management ProcessMobile App Development- Project Management Process
Mobile App Development- Project Management Process
 

Similaire à Memory-Based Cloud Architectures

Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodesaaronmorton
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...ScyllaDB
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDaehyeok Kim
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...Ben Stopford
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL Bernd Ocklin
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
 
Ndb cluster 80_ycsb_mem
Ndb cluster 80_ycsb_memNdb cluster 80_ycsb_mem
Ndb cluster 80_ycsb_memmikaelronstrom
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBen Stopford
 
Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopfordBen Stopford
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...In-Memory Computing Summit
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudCeph Community
 

Similaire à Memory-Based Cloud Architectures (20)

Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
Ndb cluster 80_ycsb_mem
Ndb cluster 80_ycsb_memNdb cluster 80_ycsb_mem
Ndb cluster 80_ycsb_mem
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
 
Kinetic basho public
Kinetic basho publicKinetic basho public
Kinetic basho public
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 

Plus de 小新 制造

Storage: Alternate Futures
Storage: Alternate FuturesStorage: Alternate Futures
Storage: Alternate Futures小新 制造
 
架构大数据 挑战、现状与展望
架构大数据 挑战、现状与展望架构大数据 挑战、现状与展望
架构大数据 挑战、现状与展望小新 制造
 
Altibase管理培训 优化篇 v1.1
Altibase管理培训 优化篇 v1.1Altibase管理培训 优化篇 v1.1
Altibase管理培训 优化篇 v1.1小新 制造
 
Altibase管理培训 管理篇
Altibase管理培训 管理篇Altibase管理培训 管理篇
Altibase管理培训 管理篇小新 制造
 
Altibase管理培训 安装篇
Altibase管理培训 安装篇Altibase管理培训 安装篇
Altibase管理培训 安装篇小新 制造
 
Ibm solid db overview v6.3 20090320
Ibm solid db overview v6.3 20090320Ibm solid db overview v6.3 20090320
Ibm solid db overview v6.3 20090320小新 制造
 
Oracle clusterware overview_11g_en
Oracle clusterware overview_11g_enOracle clusterware overview_11g_en
Oracle clusterware overview_11g_en小新 制造
 

Plus de 小新 制造 (10)

Storage: Alternate Futures
Storage: Alternate FuturesStorage: Alternate Futures
Storage: Alternate Futures
 
内存数据库[1]
内存数据库[1]内存数据库[1]
内存数据库[1]
 
架构大数据 挑战、现状与展望
架构大数据 挑战、现状与展望架构大数据 挑战、现状与展望
架构大数据 挑战、现状与展望
 
Altibase管理培训 优化篇 v1.1
Altibase管理培训 优化篇 v1.1Altibase管理培训 优化篇 v1.1
Altibase管理培训 优化篇 v1.1
 
Altibase管理培训 管理篇
Altibase管理培训 管理篇Altibase管理培训 管理篇
Altibase管理培训 管理篇
 
Altibase管理培训 安装篇
Altibase管理培训 安装篇Altibase管理培训 安装篇
Altibase管理培训 安装篇
 
Altibase介绍
Altibase介绍Altibase介绍
Altibase介绍
 
Ibm solid db_基础
Ibm solid db_基础Ibm solid db_基础
Ibm solid db_基础
 
Ibm solid db overview v6.3 20090320
Ibm solid db overview v6.3 20090320Ibm solid db overview v6.3 20090320
Ibm solid db overview v6.3 20090320
 
Oracle clusterware overview_11g_en
Oracle clusterware overview_11g_enOracle clusterware overview_11g_en
Oracle clusterware overview_11g_en
 

Dernier

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Memory-Based Cloud Architectures

  • 1. Memory-Based Cloud Architectures Jan Schaffner Enterprise Platform and Integration Concepts Group
  • 2. Definition Cloud Computing = Data Center + API
  • 3. What to take home from this talk? Answers to four questions: !  Why are memory based architectures great for cloud computing? !  How predictable is the behavior of an in-memory column database? !  Does virtualization have a negative impact on in-memory databases? !  How do I assign tenants to servers in order to manage fault-tolerance and scalability?
  • 4. First question Why are memory based architectures great for cloud computing?
  • 5. Numbers everyone should know "  L1 cache reference 0.5 ns "  Branch mispredict 5 ns "  L2 cache reference 7 ns "  Mutex lock/unlock 25 ns "  Main memory reference 100 ns (in 2008) "  Compress 1K bytes with Zippy 3,000 ns "  Send 2K bytes over 1 Gbps network 20,000 ns "  Read 1 MB sequentially from memory 250,000 ns "  Round trip within same datacenter 500,000 ns (in 2008) "  Disk seek 10,000,000 ns "  Read 1 MB sequentially from network 10,000,000 ns "  Read 1 MB sequentially from disk 20,000,000 ns "  Send packet CA ! Netherlands ! CA 150,000,000 ns Source: Jeff Dean
  • 6. Memory should be the system of record "  Typically disks have been the system of record !  Slow ! wrap them in complicated caching and distributed file systems to make them perform !  Memory used as cache all over the place but it can be invalidated when something changes on disk "  Bandwidth: !  Disk: 120 MB/s/controller !  DRAM (x86 + FSB): 10.4 GB/s/board !  DRAM (Nehalem): 25.6 GB/s/socket "  Latency: !  Disk: 13 milliseconds (up to seconds when queuing) !  InfiniBand: 1-2 microseconds !  DRAM: 5 nanoseconds
  • 7. High-end networks vs. disks Maximum bandwidths: Hard Disk 100-120 MB/s SSD 250 MB/s Serial ATA II 600 MB/s 10 GB Ethernet 1204 MB/s InfiniBand 1250 MB/s (4 channels) PCIe Flash Storage 1400 MB/s PCIe 3.0 32 GB/s DDR3-1600 25.6 GB/s (dual channel)
  • 8. Designing a database for the cloud "  Disks are the limiting factor in contemporary database systems !  Sharing a high performance disk on a machine/cluster/cloud is fine/troublesome/miserable !  While one guy is fetching 100 MB/s, everyone else is waiting "  Claim: Two machines + network is better than one machine + disk !  Log to disk on a single node: > 10,000 !s (not predictable) !  Transactions only in memory but on two nodes: < 600 !s (more predictable) "  Concept: Design to the strengths of cloud (redundancy) rather than their weaknesses (shared anything)
  • 9. Design choices for a cloud database "  No disks (in-memory delta tables + async snapshots) "  Multi-master replication !  Two copies of the data !  Load balancing both reads and (monotonic) writes !  (Eventual) consistency achieved via MVCC (+ Paxos, later) "  High-end hardware !  Nehalem for high memory bandwidth !  Fast interconnect "  Virtualization !  Ease of deployment/administration !  Consolidation/multi-tenancy
  • 10. Why consolidation? "  In-memory column databases are ideal for mixed workload processing "  But: In a SaaS environment it seems costly to give everybody their private NewDB box "  How much consolidation is possible? !  3 years worth of sales records from our favorite Fortune 500 retail company !  360 million records !  Less than 3 GB in compressed columns in memory !  Next door is a machine with 1 TB of DRAM !  (Beware of overhead)
  • 11. Multi-tenancy in the database – four different options "  No multi-tenancy – one VM per tenant !  Ex.: RightNow has 3000 tenants in 200 databases (2007): 3000 vs. 200 Amazon VMs cost $2,628,000 vs. $175,200/year !  Very strong isolation "  Shared machine – one database process per tenant T1 !  Scheduler, session manager and transaction manager need live inside the individual DB processes: IPC for synchronization T3 T2 !  Good for custom extensions, good isolation "  Shared process – one schema instance per tenant T1 !  Must support large numbers of tables T3 T2 !  Must support online schema extension and evolution "  Shared table – use a tenant_id column and partitioning T1, T2, T3 !  Bad for custom extensions, bad isolation !  Hard to backup/restore/migrate individual tenants
  • 12. Putting it all together: Rock cluster architecture &))-./(012% &))-./(012% 89!:%3;<*+7% 3+,4+,% 3+,4+,% Extract data from external system 67)1,*+,% =-5<*+,% Cluster membership, Rock Tenant placement >(<*+,% Load balance between replicas "15*+,% "15*+,% Forward writes to other replicas &'()*+,% &'()*+,% &'()*+,% !"#$% !"#$% !"#$%
  • 13. Second question How predictable is the behavior of an in-memory column database?
  • 14. What does “predictable” mean? "  Traditionally, database people are concerned with the questions of type “how do I make a query faster?” "  In a SaaS environment, the question is “how do I get a fixed (low) response time as cheap as possible?” !  Look at throughput !  Look at quantiles (e.g. 99-th percentile) "  Example formulation of desired performance: !  Response time goal “1 second in the 99-th percentile” !  Average response time around 200 ms !  Less than 1% of all queries exceed 1,000 ms !  Results in a maximum number of concurrent queries before response time goal is violated
  • 15. each test contained several tenants of a particular tenant size so that 20% of the available main memory has always been used for tenant data preloaded into main memory before each test run and a six minute warm System capacity performed before each run. The test data was generated by the Star Schem "  Fixed amount of data split equally among all tenants Data Generator [36]. 200 Figure 4.2 shows the Measured same graph using a logarithmic scale. This grap 180 Approx. Function 160 linear which means that the relation can be described as 140 Can be expressed as: Requests / s 120 log(f (tSize )) = m · log(tSize ) + n 100 80 where n is the intercept with the y-axis f (0) and m is the gradient. T 60 n and the gradient m can be estimated using regression and the Least-S 40 (see chapter 2.3.2) or simply by using a slope triangle. The gradient fo 20 20 40 60 ≈80 100 120 140(0) = n ≈ 3.6174113. This equation can, then, be m −0.945496 and f 160 180 200 220 Tenant Size in MB equation (4.1). "  Capacity " bytes scanned per second (there is a small overhead when processing more requests) "  In-memory databases behave very linearly!
  • 16. Workload "  Tenants generally have different rates and sizes "  For a given set of T tenants (on one server) define "  When Workload = 1 !  System runs at it’s maximum throughput level !  Further increase of workload will result in violation of response time goal
  • 17. Response time "  Different amounts of data and different request rates (“assorted mix”) "  Workload is varied by scaling the request rates 2500 Tenant Data 1.5 GB Tenant Data 2.0 GB 99-th Perc. Value in ms 2000 Tenant Data 2.6 GB Tenant Data 3.2 GB 1500 Prediction 1000 500 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Workload
  • 18. Impact of writes "  Added periodic batch writes (fact table grows by 0.5% every 5 minutes) 2000 Without Writes With Writes 99-th Perc. Value in ms Prediction without Writes 1500 Prediction with Writes 1000 500 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Workload
  • 19. Why is predictability good? "  Ability to plan and perform resource intensive tasks during normal operations: !  Upgrades !  Merges !  Migrations of tenants in the cluster (e.g. to dynamically re-balance the load situation in the cluster) Cost breakdown for migration of tenants
  • 20. Definition Cloud Computing = Data Center + API
  • 21. Third question Does virtualization have a negative impact on in-memory databases?
  • 22. Impact of virtualization "  Run multi-tenant OLAP benchmark on either: !  one TREX instance directly on the physical host vs. !  one TREX instance inside VM on the physical host "  Overhead is approximately 7% (both in response time and throughput) 2600 5.5 2400 5 2200 Average response time 4.5 Queries per second 2000 4 1800 3.5 1600 3 1400 2.5 1200 1000 2 800 1.5 virtual physical virtual physical 600 1 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Client Threads Client Threads
  • 23. Impact of virtualization (contd.) "  Virtualization is often used to get “better” system utilization !  What happens when a physical machine is split into multiple VMs? !  Burning CPU cycles does not hurt ! memory bandwidth is the limiting factor 160 % Response Time as a Percentage of Xeon E5450 Response Time with 1 Active Slot 150 % Xeon X5650 140 % 130 % 120 % 110 % 100 % 90 % 80 % 1 2 3 4 Concurrently Active VM Slots
  • 24. Fourth question How do I assign tenants to servers in order to manage fault-tolerance and scalability?
  • 25. Why is it good to have multiple copies of the data? "  Scalability beyond a certain number of concurrently active users "  High availability during normal operations "  Alternating execution of resource-intensive operations (e.g. merge) "  Rolling upgrades without downtime "  Data migration without downtime "  Reminder: Two in-memory copies allow faster writes and are more predictable than one in-memory copy plus disk "  Downsides: !  Response time goal might be violated during recovery !  You need to plan for twice the capacity
  • 26. Tenant placement Conventional Interleaved Mirrored Layout Layout T1 T1 T1 T1 T3 T5 T2 T2 T2 T4 T3 T3 T2 T4 T6 T3 T4 T4 T5 T6 If a node fails, all work moves to If a node fails, work moves to one other node. The system must many other nodes. Allows be 100% over-provisioned. higher utilization of nodes.
  • 27. Handcrafted best case "  Perfect placement: 1 1 4 4 7 7 1 4 7 1 2 3 !  100 tenants 2 2 5 5 8 8 2 5 8 4 5 6 !  2 copies/tenant 3 3 6 6 9 9 3 6 9 7 8 9 !  All tenants have same size Mirrored Interleaved !  10 tenants/server !  Perfect balancing (same load on all tenants): !  6M rows (204 MB compressed) of data per tenant !  The same (increasing) number of users per tenant !  No writes Mirrored Interleaved Improvement No failures 4218 users 4506 users 7% Periodic single 2265 users 4250 users 88% failures Throughput before violating response time goal
  • 28. Requirements for placement algorithm "  An optimal placement algorithm needs to cope with multiple (conflicting) goals: !  Balance load across servers !  Achieve good interleaving "  Use migrations consciously for online layout improvements (no big bang cluster re-organization) "  Take usage patterns into account !  Request rates double during last week before end of quarter !  Time-zones, Christmas, etc.
  • 29. Conclusion "  Answers to four questions: !  Why are memory based architectures great for cloud computing? !  How predictable is the behavior of an in-memory column database? !  Does virtualization have a negative impact on in-memory databases? !  How do I assign tenants to servers in order to manage fault-tolerance and scalability? "  Questions?