SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
DIVIDE AND CONQUER IN THE CLOUD:
ONE BIG SERVER OR MANY SMALL ONES?
                          Justin Swanhart
                           FOSDEM 2013
                            PLMCE 2013
Agenda
•   Introduction
•   The new age of multi-core (terminology and background)
•   Enter Shard-Query
•   Performance comparison
•   Q/A




                                                  www.percona.com
Introduction


           www.percona.com
Who am I?

My name is Justin “Swany” Swanhart


      I’m a trainer at Percona




                             www.percona.com
FOSS I maintain

• Shard-Query

    • MPP distributed query middleware for MySQL*

    • Work is mostly divided up using sharding and/or
      table partitioning (divide)

    • Distributes work over many machines in parallel
      (conquer)




                                         www.percona.com
FOSS I maintain

• Flexviews

    • Caches result sets and can refresh them
      efficiently based on only the database changes
      since the last refresh

    • Refresh cost is directly proportional to the
      number of changes to the base data




                                            www.percona.com
This talk is about Shard-Query




                          www.percona.com
PERFORMANCE IN THE NEW AGE
      OF MULTI-CORE




                    www.percona.com
The new age of multi-core

       CPU                   CPU
Core         Core     Core     Core

Core

Core     Core
             Core     Core

                      Core
                               Core

                               Core
                                      “If your time to you is worth saving,
Core     Core         Core     Core       then you better start swimming.
       CPU                   CPU                 Or you'll sink like a stone.
Core

Core
         Core

         Core
                      Core

                      Core
                               Core

                               Core
                                       For the times they are a-changing.”
Core

Core
         Core

         Core
                      Core

                      Core
                               Core

                               Core
                                                                - Bob Dylan



                                                         www.percona.com
Moore’s law
• The number of transistors in a CPU doubles
  every 24 months
• In combination with other component
  improvements, this allowed doubling of CPU
  clock speed approximately every 18 months
• Frequency scaling beyond a few GHz has
  extreme power requirements
• Increased power = increased heat


                                     www.percona.com
Moore’s law today
• Speed now doubles more slowly: 36 months
• Power efficient multiple-core designs have
  become very mature
• Larger number of slower cores which use less
  power in aggregate




                                      www.percona.com
Question:


Is multi-core faster?



                 www.percona.com
Answer: It depends

A program is only faster on a
multiple core CPU if it can use
     more than one core



                         www.percona.com
Why?
1. A physical CPU may have many logical cpu
2. Each logical CPU runs at most one thread at
   one time
3. Max running threads:
 1. Physical CPU count x CPU core density x threads per core
    •   Dual CPU with 3 HT cores = 2 x 3 x 2




                                                www.percona.com
What is a thread?
• Every program uses at least one thread
• A multithreaded program can do more work
  • But only if it can split the work into many
    threads
• If it doesn’t split up the work: then there is no
  speedup.




                                          www.percona.com
What is a task?
• An action in the system which results in work
  • A login
  • A report
  • An individual query


• Tasks are single or multi-threaded
     • Tasks may have sub-tasks




                                       www.percona.com
Response Time
• Time to complete a task in wall clock time
• Response time includes the time to complete
  all sub-tasks
  • You must include queue time in response time




                                         www.percona.com
Throughput
• How many tasks complete in a given unit of
  time
• Throughput is a measure of overall work done
  by the system




                                     www.percona.com
Throughput vs Response time


Response time = Time / Task

Throughput = Tasks / Time


                      www.percona.com
Efficiency

• This is a score about how well you scheduled the
  work for your task. Inefficient workloads
  underutilize resources
                 Resources Used
            (   Resources Available
                                      x 100 )




                                                www.percona.com
Efficiency Example
                     Resources Used
                (   Resources Available
                                          x 100 )


• CPU bound workload given 8 available cores
  •   When all work is done in single thread:12.5% CPU efficiency
  •   If the work can be split into 8 threads: 100% CPU efficiency




                                                    www.percona.com
Lost time is lost money
•   You are paying for wall clock time
    •   Return on investment is directly proportional to efficiency.
•   You can’t bank lost time
•   If you miss an SLA or response time objective you lost the time
    investment, plus you may have to pay an penalty
•   It may be impossible to get critical insight




                                                                 www.percona.com
Scalability
       When resources are added what happens to efficiency?


• Given the same workload, if throughput does
  not increase:
  • Adding even more resources will not improve
    performance
  • But you may have the resources for more
    concurrent tasks
  • This is the traditional database scaling model




                                                    www.percona.com
Scalability
     When resources are added what happens to efficiency?


• If throughput increases
  • The system scales up to the workload
  • When people ask if a system is scalable, this is usually
    what they want to know:




                                                 www.percona.com
Scalability
       When resources are added what happens to efficiency?


• If throughput goes down there is negative
  scalability
  • Mutexes are probably the culprit
  • This is the biggest contention point for databases
    with many cores
  • This means you can’t just scale up a single
    machine forever. You will always hit a limit.




                                                   www.percona.com
Response time is very important!

• Example:
  • It takes 3 days for a 1 day report
    • It doesn’t matter if you can run 100 reports at once
    • Response time for any 1 report is too high if you need to
      make decisions today about yesterday




                                                  www.percona.com
Workload
• A workload consists of many tasks
• Typical database workloads are categorized by
  the nature of the individual tasks




                                      www.percona.com
OLTP
• Frequent highly concurrent reads and writes of
  small amounts data per query.
• Simple queries, little aggregation
• Many small queries naturally divide the work
  over many cores




                                       www.percona.com
Known OLTP scalability issues
• Many concurrent threads accessing critical
  memory areas leads to mutex contention
• Reducing global mutex contention main dev
  focus
• Mutex contention is still the biggest bottleneck
  • Prevents scaling up forever (32 cores max)




                                          www.percona.com
OLAP (analytical queries)
• Low concurrency reads of large amounts of
  data
• Complex queries and frequent aggregation
• STAR schema common (data mart)
• Single table (machine generated data) common
• Partitioning very common




                                    www.percona.com
Known OLAP scalability issues
   • IO bottleneck usually gets hit first
   • However, even if all data is in memory it still
     may be impossible to reach the response time
     objective
       • Queries may not be able to complete fast enough
         to meet business objectives because:
            • MySQL only supports nested loops*
            • All queries are single threaded


* MariaDB has limited support for hash joins


                                                  www.percona.com
You don’t need a bigger boat
• Buying a bigger server probably won’t help for
  individual queries that are CPU bound.
• Queries are still single threaded.




                                       www.percona.com
You need to change the
      workload!




                    www.percona.com
You need to change the
              workload!
• Turn OLAP into something more like OLTP
  • Split one complex query into many smaller queries
  • Run those queries in parallel
  • Put the results together so it looks like nothing
    happened
  • This leverages multiple cores and multiple servers




                                              www.percona.com
Life sometimes give you lemons




                        www.percona.com
Enter Shard-Query
• Shard-Query transparently splits up complex
  SQL into smaller SQL statements (sub tasks)
  which run concurrently
  • Proxy
  • REST / HTTP GUI
  • PHP OO interface




                                    www.percona.com
Why not get a different database?

• Because MySQL is a great database and
  sticking with it has advantages
        • Operations knows how to work with it (backups!)
        • It is FOSS (alternatives are very costly or very limited)
        • MySQL’s special dialect means changes to apps to move
          to an alternative database
        • The alternatives are either a proprietary RDBMS* or
          map/reduce. For many reasons these are undesirable.




* Vertica, Greenplum, Vectorwise, AsterData, Teradata, etc…
                                                              www.percona.com
Smaller queries are better queries
• This is closer to the OLTP workload that the
  database has been optimized for
  • Smaller queries use smaller temporary tables
    resulting in more in-memory aggregation
  • This reduces the usage of temporary tables on disk
  • Parallelism reduces response time




                                          www.percona.com
Data level parallelism
• Table level partitioning
  • One big table is split up into smaller tables
    internally
  • Reduce IO and contention on shared data
    structures in the database.
  • MySQL could operate on partitions in parallel
     • But it doesn’t




                                             www.percona.com
Data level parallelism (cont)
• Sharding
  • Horizontally partition data over more than one
    server.
  • Shards can naturally be operated on in parallel.
  • This is called shared-nothing architecture, or data
    level parallelism.
  • This is also called scaling out.




                                            www.percona.com
Shard-Query




              www.percona.com
Not just for sharding
• The name is misleading as you have choices:
  • Partition your data on one big server to scale up


  • Shard your data onto multiple servers to scale out


  • Do both for extreme scalability

    • Or neither, but with less benefits



                                           www.percona.com
Parallelism of SQL dataflow
  • Shard-Query can add parallelism and improve
    query performance based on SQL language
    constructs:
       • IN lists and subqueries are parallelized
       • BETWEEN on date or integer operands adds
         parallelism too
       • UNION ALL and UNION queries are parallelized*
       • Uncorrelated subqueries are materialized early, are
         indexed and run in parallel (inside out execution)
* They have to have features necessary to enable parallelism.
  You really need to partition and shard for best results
                                                                www.percona.com
Shard-Query is not for OLTP
• Parser and execution strategy is “relatively
  expensive”
  •   For OLAP this time is a small fraction of the overall
      execution time and the cost is “cheap”
  • Tying to turn OLTP into OLTP is wasted effort and
    waste reduces efficiency
  •   But OLTP still caps out at 32 cores on a single box




                                                      www.percona.com
Sharding for OLTP

• Use Shard-Key-Mapper
  •   Use this helper class to figure out which shard to use
  •   Then send queries directly to that shard (bypass parser)
  •   This could be made transparent with mysqlnd plugins


• You can then still use SQ when complex query
  tasks are needed

• This is a topic for another day
                                                    www.percona.com
Why: Amdahl’s Law
• Speedup of parallelizing a task is bounded
  by the amount of serialized work the task
  must perform
  f(N) = maximum speedup by parallelizing a process
                 1          P
  f(N) =                +
             (1–P)          N


  N = Number of threads that may be run in parallel
  P = Portion in percent, of the runtime of the program that may be parallelized
  (1 – P) = Portion in percent, of the runtime which is serialized




                                                                    www.percona.com
1           P
Amdahl’s Law Example                                    f(N) =
                                                                   (1–P)
                                                                             +
                                                                                   N



Your task: SUM a set of 10000 items
  • Serially summing the items takes 10000* seconds


  • Solution: Partition the set into 10 subsets
    • Sum items using one thread per partition

  * Times are exaggerated for demonstration purposes.




                                                                 www.percona.com
1           P
 Amdahl’s Law Example                                 f(N) =
                                                                 (1–P)
                                                                           +
                                                                                 N



 • Summing 10 sets items in parallel takes 1000
   seconds (10000s/10=1000s)
    • Add an extra 10 seconds to add up the 10 sub-
      results
        • This is .1% of the original runtime


     Max speedup = (.1–99.9))+(99.0/10) = 9.98X



* Times are exaggerated for demonstration purposes.
                                                               www.percona.com
Amdahl’s Law Example 2
Your task: STDDEV a set of 10000 items
  • Some aggregate functions like STDDEV can’t be
    distributed like SUM
  • Serially doing a STDDEV takes 10000 seconds
  • Because it can’t be split, we are forced to use one
    thread
  • The STDDEV takes 10000 second (10000/1)
  • Max speedup = 1.0 (no speedup)
  • Overhead may reduce efficiency


                                            www.percona.com
You get to move all the pieces at the same time

                   T32

      T1                                T1



                   T48

      T4                                T4



                         T64            T8

      T8




                                       www.percona.com
Shard-Query 2.0 Features
•   Automatic sharding and massively parallel loading
    •   Add new shards then spread new data over them
        automatically
•   Long query support
    •   Supports asynchronous jobs for running long queries
• Complex query support
    •   GROUP BY, HAVING, LIMIT, ORDER BY, WITH ROLLUP,
        derived tables, UNION, etc




                                                    www.percona.com
Shard-Query 2.0 Features
•   Automatic sharding and massively parallel loading
    •   Add new shards then spread new data over them
        automatically
•   Long query support
    •   Supports asynchronous jobs for running long queries




                                                    www.percona.com
Shard-Query 2.0

                                 Sharding
       Shard-                     and/or         Parallel
                +   Gearman   + Partitioned =
       Query                                    Execution
                                  Tables

                     Task1         Shard1        Partition 1



SQL                  Task2         Shard1        Partition 2
         GUI
        Proxy
       PHP OO
DATA                 Task3         Shard2        Partition 1


                     Task4         Shard2        Partition 2




                                                Data Flow

                                                      www.percona.com
Code Maturity
•   Revision 1, May 22, 2010 0.0.1
    •   270 lines of PHP code checked into Google Code
    •   Limited select support, no aggregation pushdown
    •   One developer (me) – only suited for limited audience as a POC
    •   Not object oriented, or a framework, just a simple CLI PHP app


•   Revision 447, Jan 28, 2013 – Shard Query beta 2.0.0
    •   Over 9700 lines of PHP code
    •   Full coverage of SELECT statement plus custom aggregate function support
    •   Support for all almost DML and DDL operations (except create/drop database)
    •   Two additional developers
         •   Special thanks to Alex Hurd who is a production tester and contributor of the REST interface
         •   And Andre Rothe, who contributes to the SQL parser
    •   REST web interface, MySQL proxy Lua script, OO PHP library framework
    •   Feature complete and stable to a level fit for community use


                                                                                            www.percona.com
One big server or many small
            ones




                       www.percona.com
Big Data: Wikipedia Traffic Stats
• Wikipedia traffic stats
    • 2 months of data (Feb and Mar 2008)
    • 180 GB raw data (single table)
         •   No PK
         •   Partitioned by date, one partition per day
         •   One index, on the wiki language (“en”, etc)
         •   360 GB data directory after loading
              • 7 years of data is available (15TB)



http://dumps.wikimedia.org/other/pagecounts-raw/


                                                           www.percona.com
Big Data: Wikipedia Traffic Stats
CREATE TABLE `fact` (
  `date_id` int(11) NOT NULL,                              Data set is sharded by
  `hour_num` tinyint(4) NOT NULL,                          hour_num to evenly
  `page` varchar(1024) NOT NULL,                           distribute work between
  `project` char(3) NOT NULL,                              servers (each server has
  `language` varchar(12) NOT NULL,                         1/8 of the data)
  `cnt` bigint(20) NOT NULL,
  `bytes` bigint(20) NOT NULL,
  KEY `language` (`language`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY RANGE (date_id)                                 SQ can split tasks over
(PARTITION p1 VALUES LESS THAN (20080101) ENGINE = InnoDB,   partitions too, for multiple
 PARTITION p2 VALUES LESS THAN (20080102) ENGINE = InnoDB,   tasks per shard
PARTITION p3 VALUES LESS THAN (20080103) ENGINE = InnoDB,
…,
PARTITION p91 VALUES LESS THAN (20080331) ENGINE = InnoDB,
PARTITION p92 VALUES LESS THAN (20080401) ENGINE = InnoDB,
PARTITION pmax VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB,
)



                                                                   www.percona.com
EC2 Instance Sizes
                        Instance Sizes Compared
 Instance
              Feature           Measure                Price Per hour
   Size
                     cores          2                    0.41 USD
                       ecu        6.5
m2.xlarge
                  ecu/core       3.25
The Pawn
                   storage        420
                  memory         17.1
                     cores         16                     3.1 USD
                       ecu         35
hs1.8xlarge
                  ecu/core       2.25
 The King
                   storage       2048                                   For a few cents more we
                  memory         60.5                                   get much more CPU
                                                                        power in aggregate if
                               Scale Out                                eight smaller machines
 Instance                          1x           8x      8X Price Per
   Size
              Feature
                                Machine     Machines       hour         are used
                     cores          2           16       3.28 USD
                       ecu        6.5           52
m2.xlarge
                  ecu/core       3.25         3.25
The Pawn
                   storage        420         3360                      Working set fits into
                  memory         17.1        136.8
                     cores         16          128                      the ram of 8 servers
                       ecu         35          280
hs1.8xlarge
                  ecu/core       2.25         2.25
 The King
                   storage       2048        16384
                  memory         60.5          484




                                                                             www.percona.com
Anybody want to bet on the King?




                        www.percona.com
Working Set
• The queries examine up to 15 days of data
         • It is important that the working set be able to be kept in
           memory for best performance
         • Each partition contains approximately 8GB of data
         • Working set is 120GB of InnoDB pages (60GB raw)
              • Doesn’t fit on a single server



http://dumps.wikimedia.org/other/pagecounts-raw/




                                                         www.percona.com
Don’t bet on the king…
Single-Threaded select count(*)
Query             from stats.fact
Performance the where date_id = 20080214;                                            8 shards should
unsharded       +----------+                                                         be able to do the work
                | count(*) |                                                         In ~18s or ~2 seconds,
data set
               +----------+                                                          depending on where
               | 84795020 |                                                          the bottleneck is.
               +----------+

                               Instance Sizes Compared (again)
                Instance                              Performance
                             Feature        Measure                 Price Per hour
                  Size                                COLD / HOT
                                    cores       2                     0.41 USD
                                      ecu     6.5
               m2.xlarge
                                 ecu/core    3.25     2m22s / 33s      .21/core
               The Pawn
                                  storage     420
                                 memory      17.1
                                    cores      16                      3.1 USD
                                      ecu      35
               hs1.8xlarge
                                 ecu/core    2.25     1m10s / 43s      .19/core
                The King
                                  storage    2048
                                 memory      60.5




                                                                                       www.percona.com
Simple In-Memory COUNT(*) query performance

800
                                                                                                  Days    8 Pawns    The King
                                                                      750.457135

700                                                                                                1      2.552858   40.84573
                                                           643.0426209                             2      5.090356   81.4457
600
                                                                                                   3      8.064888   129.0382

                                                  526.9528692                                      4      10.74412   171.9059
500
                                                                                                   5      13.32697   213.2316
                                                                              8 Pawns
                                                                              The King             6      16.0227    256.3633
400                                      405.4624271
                                                                              Linear (8 Pawns)     7      18.50571   296.0914
                                                                              Linear (The King)
                                                                                                   8      21.02053   336.3285
300                             296.091397
                                                                                                   9      25.3414    405.4624

                        213.2315872                                                                10     29.69324   475.0918
200
                                                                                                   11     32.93455   526.9529
               129.0382018                                                                         12     36.5517    584.8272
100

40.87
                                                                                                   13     40.19016   643.0426
                                                                      44.69
2.552858055                                             40.19016381
                                               32.93455432
                                      25.3414017                                                   14      42.75     699.1011
   0                         18.50571232
                      13.3269742
            8.064887613
                                                                                                   15      44.69     750.4571




                                                                                                         www.percona.com
Where to go from here
• Shard-Query is not the answer to every performance problem

   • Flexviews can be used too
   • This helps when the working set is too large to keep in memory
     even with many machines
   • Cost of aggregation is amortized over time, and distributed
     over many machines if you still shard
   • Scales reads, writes, aggregation


• But it is an excellent addition to your toolbox




                                                    www.percona.com
Distributed row store w/ Galera
• Each shard is an Percona XtraDB or MariaDB cluster

• Support massive ingestion rates via MPP loader

• real time ad-hoc complex querying

• All the components support HA
   • Galera, Gearman, Apache, PHP, MySQL proxy

• Redundancy can be fully geographically distributed

• Use partitioning at the table level too



                                                       www.percona.com
Distributed document store
• Each shard is a MariaDB cluster

• Use dynamic columns and anchor models for dynamic schema




                                               www.percona.com
Distributed column store
• Each shard is a Infobright database

• Pros

   • Hash joins, small data footprint, no indexes

• Cons

  • Single threaded loading (one thread per shard, can’t take
    advantage of MPP loader)

  • Append only (LOAD DATA INFILE)

  • No partitions
     • SQL harder to parallelize for scale up
                                                    www.percona.com
justin.swanhart@percona.com
                   @jswanhart



We're Hiring! www.percona.com/about-us/careers/
I’m also speaking at:




  1. Detecting and preventing SQL injection with the Percona Toolkit
  2. Divide and Conquer in the cloud: One big server or many small ones?
  3. Introduction to open source column stores
                                                                           PLMCE.com

                                                                April 2013

Contenu connexe

Tendances

Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msIlya Ganelin
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
Empowering developers to deploy their own data stores
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data storesTomas Doran
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Yukinori Suda
 
Impala Resource Management - OUTDATED
Impala Resource Management - OUTDATEDImpala Resource Management - OUTDATED
Impala Resource Management - OUTDATEDMatthew Jacobs
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance TuningFromDual GmbH
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopAllen Wittenauer
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
Simple Works Best
 Simple Works Best Simple Works Best
Simple Works BestEDB
 
Impala 2.0 Update #impalajp
Impala 2.0 Update #impalajpImpala 2.0 Update #impalajp
Impala 2.0 Update #impalajpCloudera Japan
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Scaling Hibernate with Terracotta
Scaling Hibernate with TerracottaScaling Hibernate with Terracotta
Scaling Hibernate with TerracottaAlex Miller
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationTim Callaghan
 

Tendances (20)

Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Empowering developers to deploy their own data stores
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data stores
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
 
Impala Resource Management - OUTDATED
Impala Resource Management - OUTDATEDImpala Resource Management - OUTDATED
Impala Resource Management - OUTDATED
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance Tuning
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache Hadoop
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Simple Works Best
 Simple Works Best Simple Works Best
Simple Works Best
 
Impala 2.0 Update #impalajp
Impala 2.0 Update #impalajpImpala 2.0 Update #impalajp
Impala 2.0 Update #impalajp
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Scaling Hibernate with Terracotta
Scaling Hibernate with TerracottaScaling Hibernate with Terracotta
Scaling Hibernate with Terracotta
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 

Similaire à Divide and conquer in the cloud

Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsJosh Carlisle
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012Sean Laurent
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big timeproitconsult
 
Leveraging MongoDB: An Introductory Case Study
Leveraging MongoDB: An Introductory Case StudyLeveraging MongoDB: An Introductory Case Study
Leveraging MongoDB: An Introductory Case StudySean Laurent
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Bridging the Developer and the Datacenter
Bridging the Developer and the DatacenterBridging the Developer and the Datacenter
Bridging the Developer and the Datacenterlurs83
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive WritesLiran Zelkha
 
Webinar: Capacity Planning
Webinar: Capacity PlanningWebinar: Capacity Planning
Webinar: Capacity PlanningMongoDB
 
Tiger oracle
Tiger oracleTiger oracle
Tiger oracled0nn9n
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabitsYves Goeleven
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)Martijn Verburg
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixJason Brown
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Bob Pusateri
 

Similaire à Divide and conquer in the cloud (20)

Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big time
 
Leveraging MongoDB: An Introductory Case Study
Leveraging MongoDB: An Introductory Case StudyLeveraging MongoDB: An Introductory Case Study
Leveraging MongoDB: An Introductory Case Study
 
Performance stack
Performance stackPerformance stack
Performance stack
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Bridging the Developer and the Datacenter
Bridging the Developer and the DatacenterBridging the Developer and the Datacenter
Bridging the Developer and the Datacenter
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive Writes
 
Webinar: Capacity Planning
Webinar: Capacity PlanningWebinar: Capacity Planning
Webinar: Capacity Planning
 
Tiger oracle
Tiger oracleTiger oracle
Tiger oracle
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabits
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 

Dernier

activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 

Dernier (20)

activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 

Divide and conquer in the cloud

  • 1. DIVIDE AND CONQUER IN THE CLOUD: ONE BIG SERVER OR MANY SMALL ONES? Justin Swanhart FOSDEM 2013 PLMCE 2013
  • 2. Agenda • Introduction • The new age of multi-core (terminology and background) • Enter Shard-Query • Performance comparison • Q/A www.percona.com
  • 3. Introduction www.percona.com
  • 4. Who am I? My name is Justin “Swany” Swanhart I’m a trainer at Percona www.percona.com
  • 5. FOSS I maintain • Shard-Query • MPP distributed query middleware for MySQL* • Work is mostly divided up using sharding and/or table partitioning (divide) • Distributes work over many machines in parallel (conquer) www.percona.com
  • 6. FOSS I maintain • Flexviews • Caches result sets and can refresh them efficiently based on only the database changes since the last refresh • Refresh cost is directly proportional to the number of changes to the base data www.percona.com
  • 7. This talk is about Shard-Query www.percona.com
  • 8. PERFORMANCE IN THE NEW AGE OF MULTI-CORE www.percona.com
  • 9. The new age of multi-core CPU CPU Core Core Core Core Core Core Core Core Core Core Core Core “If your time to you is worth saving, Core Core Core Core then you better start swimming. CPU CPU Or you'll sink like a stone. Core Core Core Core Core Core Core Core For the times they are a-changing.” Core Core Core Core Core Core Core Core - Bob Dylan www.percona.com
  • 10. Moore’s law • The number of transistors in a CPU doubles every 24 months • In combination with other component improvements, this allowed doubling of CPU clock speed approximately every 18 months • Frequency scaling beyond a few GHz has extreme power requirements • Increased power = increased heat www.percona.com
  • 11. Moore’s law today • Speed now doubles more slowly: 36 months • Power efficient multiple-core designs have become very mature • Larger number of slower cores which use less power in aggregate www.percona.com
  • 13. Answer: It depends A program is only faster on a multiple core CPU if it can use more than one core www.percona.com
  • 14. Why? 1. A physical CPU may have many logical cpu 2. Each logical CPU runs at most one thread at one time 3. Max running threads: 1. Physical CPU count x CPU core density x threads per core • Dual CPU with 3 HT cores = 2 x 3 x 2 www.percona.com
  • 15. What is a thread? • Every program uses at least one thread • A multithreaded program can do more work • But only if it can split the work into many threads • If it doesn’t split up the work: then there is no speedup. www.percona.com
  • 16. What is a task? • An action in the system which results in work • A login • A report • An individual query • Tasks are single or multi-threaded • Tasks may have sub-tasks www.percona.com
  • 17. Response Time • Time to complete a task in wall clock time • Response time includes the time to complete all sub-tasks • You must include queue time in response time www.percona.com
  • 18. Throughput • How many tasks complete in a given unit of time • Throughput is a measure of overall work done by the system www.percona.com
  • 19. Throughput vs Response time Response time = Time / Task Throughput = Tasks / Time www.percona.com
  • 20. Efficiency • This is a score about how well you scheduled the work for your task. Inefficient workloads underutilize resources Resources Used ( Resources Available x 100 ) www.percona.com
  • 21. Efficiency Example Resources Used ( Resources Available x 100 ) • CPU bound workload given 8 available cores • When all work is done in single thread:12.5% CPU efficiency • If the work can be split into 8 threads: 100% CPU efficiency www.percona.com
  • 22. Lost time is lost money • You are paying for wall clock time • Return on investment is directly proportional to efficiency. • You can’t bank lost time • If you miss an SLA or response time objective you lost the time investment, plus you may have to pay an penalty • It may be impossible to get critical insight www.percona.com
  • 23. Scalability When resources are added what happens to efficiency? • Given the same workload, if throughput does not increase: • Adding even more resources will not improve performance • But you may have the resources for more concurrent tasks • This is the traditional database scaling model www.percona.com
  • 24. Scalability When resources are added what happens to efficiency? • If throughput increases • The system scales up to the workload • When people ask if a system is scalable, this is usually what they want to know: www.percona.com
  • 25. Scalability When resources are added what happens to efficiency? • If throughput goes down there is negative scalability • Mutexes are probably the culprit • This is the biggest contention point for databases with many cores • This means you can’t just scale up a single machine forever. You will always hit a limit. www.percona.com
  • 26. Response time is very important! • Example: • It takes 3 days for a 1 day report • It doesn’t matter if you can run 100 reports at once • Response time for any 1 report is too high if you need to make decisions today about yesterday www.percona.com
  • 27. Workload • A workload consists of many tasks • Typical database workloads are categorized by the nature of the individual tasks www.percona.com
  • 28. OLTP • Frequent highly concurrent reads and writes of small amounts data per query. • Simple queries, little aggregation • Many small queries naturally divide the work over many cores www.percona.com
  • 29. Known OLTP scalability issues • Many concurrent threads accessing critical memory areas leads to mutex contention • Reducing global mutex contention main dev focus • Mutex contention is still the biggest bottleneck • Prevents scaling up forever (32 cores max) www.percona.com
  • 30. OLAP (analytical queries) • Low concurrency reads of large amounts of data • Complex queries and frequent aggregation • STAR schema common (data mart) • Single table (machine generated data) common • Partitioning very common www.percona.com
  • 31. Known OLAP scalability issues • IO bottleneck usually gets hit first • However, even if all data is in memory it still may be impossible to reach the response time objective • Queries may not be able to complete fast enough to meet business objectives because: • MySQL only supports nested loops* • All queries are single threaded * MariaDB has limited support for hash joins www.percona.com
  • 32. You don’t need a bigger boat • Buying a bigger server probably won’t help for individual queries that are CPU bound. • Queries are still single threaded. www.percona.com
  • 33. You need to change the workload! www.percona.com
  • 34. You need to change the workload! • Turn OLAP into something more like OLTP • Split one complex query into many smaller queries • Run those queries in parallel • Put the results together so it looks like nothing happened • This leverages multiple cores and multiple servers www.percona.com
  • 35. Life sometimes give you lemons www.percona.com
  • 36. Enter Shard-Query • Shard-Query transparently splits up complex SQL into smaller SQL statements (sub tasks) which run concurrently • Proxy • REST / HTTP GUI • PHP OO interface www.percona.com
  • 37. Why not get a different database? • Because MySQL is a great database and sticking with it has advantages • Operations knows how to work with it (backups!) • It is FOSS (alternatives are very costly or very limited) • MySQL’s special dialect means changes to apps to move to an alternative database • The alternatives are either a proprietary RDBMS* or map/reduce. For many reasons these are undesirable. * Vertica, Greenplum, Vectorwise, AsterData, Teradata, etc… www.percona.com
  • 38. Smaller queries are better queries • This is closer to the OLTP workload that the database has been optimized for • Smaller queries use smaller temporary tables resulting in more in-memory aggregation • This reduces the usage of temporary tables on disk • Parallelism reduces response time www.percona.com
  • 39. Data level parallelism • Table level partitioning • One big table is split up into smaller tables internally • Reduce IO and contention on shared data structures in the database. • MySQL could operate on partitions in parallel • But it doesn’t www.percona.com
  • 40. Data level parallelism (cont) • Sharding • Horizontally partition data over more than one server. • Shards can naturally be operated on in parallel. • This is called shared-nothing architecture, or data level parallelism. • This is also called scaling out. www.percona.com
  • 41. Shard-Query www.percona.com
  • 42. Not just for sharding • The name is misleading as you have choices: • Partition your data on one big server to scale up • Shard your data onto multiple servers to scale out • Do both for extreme scalability • Or neither, but with less benefits www.percona.com
  • 43. Parallelism of SQL dataflow • Shard-Query can add parallelism and improve query performance based on SQL language constructs: • IN lists and subqueries are parallelized • BETWEEN on date or integer operands adds parallelism too • UNION ALL and UNION queries are parallelized* • Uncorrelated subqueries are materialized early, are indexed and run in parallel (inside out execution) * They have to have features necessary to enable parallelism. You really need to partition and shard for best results www.percona.com
  • 44. Shard-Query is not for OLTP • Parser and execution strategy is “relatively expensive” • For OLAP this time is a small fraction of the overall execution time and the cost is “cheap” • Tying to turn OLTP into OLTP is wasted effort and waste reduces efficiency • But OLTP still caps out at 32 cores on a single box www.percona.com
  • 45. Sharding for OLTP • Use Shard-Key-Mapper • Use this helper class to figure out which shard to use • Then send queries directly to that shard (bypass parser) • This could be made transparent with mysqlnd plugins • You can then still use SQ when complex query tasks are needed • This is a topic for another day www.percona.com
  • 46. Why: Amdahl’s Law • Speedup of parallelizing a task is bounded by the amount of serialized work the task must perform f(N) = maximum speedup by parallelizing a process 1 P f(N) = + (1–P) N N = Number of threads that may be run in parallel P = Portion in percent, of the runtime of the program that may be parallelized (1 – P) = Portion in percent, of the runtime which is serialized www.percona.com
  • 47. 1 P Amdahl’s Law Example f(N) = (1–P) + N Your task: SUM a set of 10000 items • Serially summing the items takes 10000* seconds • Solution: Partition the set into 10 subsets • Sum items using one thread per partition * Times are exaggerated for demonstration purposes. www.percona.com
  • 48. 1 P Amdahl’s Law Example f(N) = (1–P) + N • Summing 10 sets items in parallel takes 1000 seconds (10000s/10=1000s) • Add an extra 10 seconds to add up the 10 sub- results • This is .1% of the original runtime Max speedup = (.1–99.9))+(99.0/10) = 9.98X * Times are exaggerated for demonstration purposes. www.percona.com
  • 49. Amdahl’s Law Example 2 Your task: STDDEV a set of 10000 items • Some aggregate functions like STDDEV can’t be distributed like SUM • Serially doing a STDDEV takes 10000 seconds • Because it can’t be split, we are forced to use one thread • The STDDEV takes 10000 second (10000/1) • Max speedup = 1.0 (no speedup) • Overhead may reduce efficiency www.percona.com
  • 50. You get to move all the pieces at the same time T32 T1 T1 T48 T4 T4 T64 T8 T8 www.percona.com
  • 51. Shard-Query 2.0 Features • Automatic sharding and massively parallel loading • Add new shards then spread new data over them automatically • Long query support • Supports asynchronous jobs for running long queries • Complex query support • GROUP BY, HAVING, LIMIT, ORDER BY, WITH ROLLUP, derived tables, UNION, etc www.percona.com
  • 52. Shard-Query 2.0 Features • Automatic sharding and massively parallel loading • Add new shards then spread new data over them automatically • Long query support • Supports asynchronous jobs for running long queries www.percona.com
  • 53. Shard-Query 2.0 Sharding Shard- and/or Parallel + Gearman + Partitioned = Query Execution Tables Task1 Shard1 Partition 1 SQL Task2 Shard1 Partition 2 GUI Proxy PHP OO DATA Task3 Shard2 Partition 1 Task4 Shard2 Partition 2 Data Flow www.percona.com
  • 54. Code Maturity • Revision 1, May 22, 2010 0.0.1 • 270 lines of PHP code checked into Google Code • Limited select support, no aggregation pushdown • One developer (me) – only suited for limited audience as a POC • Not object oriented, or a framework, just a simple CLI PHP app • Revision 447, Jan 28, 2013 – Shard Query beta 2.0.0 • Over 9700 lines of PHP code • Full coverage of SELECT statement plus custom aggregate function support • Support for all almost DML and DDL operations (except create/drop database) • Two additional developers • Special thanks to Alex Hurd who is a production tester and contributor of the REST interface • And Andre Rothe, who contributes to the SQL parser • REST web interface, MySQL proxy Lua script, OO PHP library framework • Feature complete and stable to a level fit for community use www.percona.com
  • 55. One big server or many small ones www.percona.com
  • 56. Big Data: Wikipedia Traffic Stats • Wikipedia traffic stats • 2 months of data (Feb and Mar 2008) • 180 GB raw data (single table) • No PK • Partitioned by date, one partition per day • One index, on the wiki language (“en”, etc) • 360 GB data directory after loading • 7 years of data is available (15TB) http://dumps.wikimedia.org/other/pagecounts-raw/ www.percona.com
  • 57. Big Data: Wikipedia Traffic Stats CREATE TABLE `fact` ( `date_id` int(11) NOT NULL, Data set is sharded by `hour_num` tinyint(4) NOT NULL, hour_num to evenly `page` varchar(1024) NOT NULL, distribute work between `project` char(3) NOT NULL, servers (each server has `language` varchar(12) NOT NULL, 1/8 of the data) `cnt` bigint(20) NOT NULL, `bytes` bigint(20) NOT NULL, KEY `language` (`language`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY RANGE (date_id) SQ can split tasks over (PARTITION p1 VALUES LESS THAN (20080101) ENGINE = InnoDB, partitions too, for multiple PARTITION p2 VALUES LESS THAN (20080102) ENGINE = InnoDB, tasks per shard PARTITION p3 VALUES LESS THAN (20080103) ENGINE = InnoDB, …, PARTITION p91 VALUES LESS THAN (20080331) ENGINE = InnoDB, PARTITION p92 VALUES LESS THAN (20080401) ENGINE = InnoDB, PARTITION pmax VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB, ) www.percona.com
  • 58. EC2 Instance Sizes Instance Sizes Compared Instance Feature Measure Price Per hour Size cores 2 0.41 USD ecu 6.5 m2.xlarge ecu/core 3.25 The Pawn storage 420 memory 17.1 cores 16 3.1 USD ecu 35 hs1.8xlarge ecu/core 2.25 The King storage 2048 For a few cents more we memory 60.5 get much more CPU power in aggregate if Scale Out eight smaller machines Instance 1x 8x 8X Price Per Size Feature Machine Machines hour are used cores 2 16 3.28 USD ecu 6.5 52 m2.xlarge ecu/core 3.25 3.25 The Pawn storage 420 3360 Working set fits into memory 17.1 136.8 cores 16 128 the ram of 8 servers ecu 35 280 hs1.8xlarge ecu/core 2.25 2.25 The King storage 2048 16384 memory 60.5 484 www.percona.com
  • 59. Anybody want to bet on the King? www.percona.com
  • 60. Working Set • The queries examine up to 15 days of data • It is important that the working set be able to be kept in memory for best performance • Each partition contains approximately 8GB of data • Working set is 120GB of InnoDB pages (60GB raw) • Doesn’t fit on a single server http://dumps.wikimedia.org/other/pagecounts-raw/ www.percona.com
  • 61. Don’t bet on the king… Single-Threaded select count(*) Query from stats.fact Performance the where date_id = 20080214; 8 shards should unsharded +----------+ be able to do the work | count(*) | In ~18s or ~2 seconds, data set +----------+ depending on where | 84795020 | the bottleneck is. +----------+ Instance Sizes Compared (again) Instance Performance Feature Measure Price Per hour Size COLD / HOT cores 2 0.41 USD ecu 6.5 m2.xlarge ecu/core 3.25 2m22s / 33s .21/core The Pawn storage 420 memory 17.1 cores 16 3.1 USD ecu 35 hs1.8xlarge ecu/core 2.25 1m10s / 43s .19/core The King storage 2048 memory 60.5 www.percona.com
  • 62. Simple In-Memory COUNT(*) query performance 800 Days 8 Pawns The King 750.457135 700 1 2.552858 40.84573 643.0426209 2 5.090356 81.4457 600 3 8.064888 129.0382 526.9528692 4 10.74412 171.9059 500 5 13.32697 213.2316 8 Pawns The King 6 16.0227 256.3633 400 405.4624271 Linear (8 Pawns) 7 18.50571 296.0914 Linear (The King) 8 21.02053 336.3285 300 296.091397 9 25.3414 405.4624 213.2315872 10 29.69324 475.0918 200 11 32.93455 526.9529 129.0382018 12 36.5517 584.8272 100 40.87 13 40.19016 643.0426 44.69 2.552858055 40.19016381 32.93455432 25.3414017 14 42.75 699.1011 0 18.50571232 13.3269742 8.064887613 15 44.69 750.4571 www.percona.com
  • 63. Where to go from here • Shard-Query is not the answer to every performance problem • Flexviews can be used too • This helps when the working set is too large to keep in memory even with many machines • Cost of aggregation is amortized over time, and distributed over many machines if you still shard • Scales reads, writes, aggregation • But it is an excellent addition to your toolbox www.percona.com
  • 64. Distributed row store w/ Galera • Each shard is an Percona XtraDB or MariaDB cluster • Support massive ingestion rates via MPP loader • real time ad-hoc complex querying • All the components support HA • Galera, Gearman, Apache, PHP, MySQL proxy • Redundancy can be fully geographically distributed • Use partitioning at the table level too www.percona.com
  • 65. Distributed document store • Each shard is a MariaDB cluster • Use dynamic columns and anchor models for dynamic schema www.percona.com
  • 66. Distributed column store • Each shard is a Infobright database • Pros • Hash joins, small data footprint, no indexes • Cons • Single threaded loading (one thread per shard, can’t take advantage of MPP loader) • Append only (LOAD DATA INFILE) • No partitions • SQL harder to parallelize for scale up www.percona.com
  • 67. justin.swanhart@percona.com @jswanhart We're Hiring! www.percona.com/about-us/careers/
  • 68. I’m also speaking at: 1. Detecting and preventing SQL injection with the Percona Toolkit 2. Divide and Conquer in the cloud: One big server or many small ones? 3. Introduction to open source column stores PLMCE.com April 2013