SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
Methods of Sharding MySQL
            Percona Live NYC 2012
Who are Palomino?
Bespoke Services: we work with and like you.
Production Experienced: senior DBAs, admins, and engineers.
24x7: globally-distributed on-call staff.
Short-term no-lock-in contracts.
Professional Services (DevOps):
 ➢ Chef,

 ➢ Puppet,

 ➢ Ansible.


Big Data Cluster Administration (OpsDev):
 ➢ MySQL, PostgreSQL,

 ➢ Cassandra, HBase,

 ➢ MongoDB, Couchbase.
Methods of Sharding MySQL
               Percona Live NYC 2012
Who am I?
Tim Ellis
CTO/Principal Architect, Palomino

Achievements:
 ➢ Palomino Big Data Strategy.

 ➢ Datawarehouse Cluster at Riot Games.

 ➢ Back-end Storage Architecture for Firefox Sync.

 ➢ Led DB teams at Digg for four years.

 ➢ Harassed the Reddit team at one of their parties.


Ensured Successful Business for:
 ➢ Digg, Friendster,

 ➢ Riot Games,

 ➢ Mozilla,

 ➢ StumbleUpon.
Methods of Sharding MySQL
         What is this Talk?
Large cluster admin: when one DB isn't enough.
 ➢ What is a shard?

 ➢ What shard types can I choose?

 ➢ How to build a large DB cluster.

 ➢ How to administer that giant mess of DBs.




Types of large clusters:
 ➢ Just a bunch of databases.

 ➢ Distributed database across machines.
Methods of Sharding MySQL
         Where the Focus will Lie
12% – Sharding theory/considerations.

25% – Building a Cluster to administer (tutorial):
 ➢ Palomino Cluster Tool.




50% – Flexible large-cluster administration (tutorial):
 ➢ Tumblr's Jetpants.




13% – Other sharding technologies (talk-only):
 ➢ Youtube's Vtocc (Vitess),

 ➢ Twitter's Gizzard,

 ➢ HAproxy.
Methods of Sharding MySQL
         What about the Silver Bullets?
NoSQL Distributed Databases:
➢ Promise “sharding” for free,

➢ Uptime and horizontal scaling trivially.




Reality:
➢ RDBMS is 40-yr-old tech,

➢ NoSQL is 10-yr-old tech,

➢ Which responsible for how many high-profile

  downtimes in the past 10 years?
➢ Evaluate the alternatives without illusions.
Methods of Sharding MySQL
                      What is a Shard?
A location for a subset of data:
➢ Itself made of pieces.

➢ Typically itself redundant.



   Shard for User Data        Shard for Logging Data    Shard for Posts Data




       Master                      Master                   Master




    Slave           Slave       Slave           Slave    Slave           Slave
            Slave                       Slave                    Slave
Methods of Sharding MySQL
         What are the Sharding Method Choices?
By-Function:
➢ Move busy tables onto new shard.

➢ Writes of busiest tables on new hardware.

➢ Writes of remaining tables on current.


By-Columns:
➢ Split table into chunks of related columns,

  store each set on its own Master/Slaves shard.
By-Rows:
➢ A table is split into N shards, shard gets a

  subset of the rows of the table.
Methods of Sharding MySQL
         Shard Method Choices
By-function and By-Column Methods:
➢ Much easier.

➢ Can get you through months to years.

➢ Eventually you run out of options here.




By-Row Method:
➢ The hardest to do.

➢ Requires new ways of accessing data.

➢ Often requires sophisticated cache strategies.

➢ Itself can be done several ways.
Methods of Sharding MySQL
         By-Function Sharding
Picking a Functional Split:
 ➢ A subset of tables commonly joined.

 ➢ Tables outside this subset nearly never joined.

 ➢ One of them responsible for many writes.




Every table outside subset requires rewriting
JOINs into code-based multi-SELECTs.

Once subset of tables moved onto their own
server, writes are distributed.
Methods of Sharding MySQL
         By-Column Sharding (Vertical Partition)
Identifying candidate table:
 ➢ Many columns (“users” anyone?),

 ➢ Many updates,

 ➢ Many indexes.




Required: even split of columns/indexes by
update frequency. Attempt: logical grouping.

JOINs not possible nor desireable: write multi-
SELECT code in application DAL.
Methods of Sharding MySQL
         Row-based Sharding Choices
Range-based Sharding:
➢ Easy to understand.

➢ Each shard gets a range of rows.

➢ Oft-times some shards are “hot.”

➢ Hot shards are split into separate shards.

➢ Cold shards are joined into a single shard.

➢ Juggling shard load is a frequent process.




Typically the best solution. Shortcomings have
known work-arounds.
Methods of Sharding MySQL
         Row-based Sharding Choices
Modulus/Hash-based Sharding:
➢ Row key is hashed to integer modulo number

  of shards, then placed on that shard.
➢ Only rarely are some shards are “hot.”

➢ Shard splitting is difficult to implement.




Also a common method of sharding. We hope
not to split shards often (or ever).

When we do, it's a multi-week process.
Methods of Sharding MySQL
         Row-based Sharding Choices
Lookup Table-based Sharding:
 ➢ Easy to understand.

 ➢ Row key mapped to shard in a lookup table.

 ➢ Easy to move load off hot shards.

 ➢ Lookup table method is problematic:

   ➢ Single point of failure.
   ➢ Performance bottleneck.


   ➢ Billions of rows, itself may need sharding.
Prerequisite: Build a Large Cluster
         Allocating the Hardware
Getting Hardware – your own company's:
➢ Can be politically-charged.

➢ Get a small batch first.

➢ Build small demonstration cluster.

➢ Get everyone on-board with the demo.


Renting/Leasing Hardware – the Cloud:
➢ Allocate hardware in EC2 or elsewhere.

➢ Usually easier, but possibly harder admin:

   ➢ Hardware failure more common.
   ➢ Hardware/network flakiness more common.
Prerequisite: Build a Large Cluster
        Building the Cluster




Okay, I've got the hardware. What next?
Prerequisite: Build a Large Cluster
         Building the Cluster
Configuring the Hardware. The old dilemma:
➢ Spend days to install/configure DB software?


  Subsequent management is painful.
➢ Use SSH in “for” loops?


  Rolling your own configuration management
  tools is a lot of work.
➢ Learn a configuration management tool?


  Obvious choice in 2012. Well-documented
  tools like Chef, Puppet, Ansible.
Configuration Management Tools
         My Experience
Puppet: 6 years ago at Digg
 ➢ Manage/Deploy of hundreds of servers.

 ➢ Painful, but not as bad as hand-coding it all.


Chef: 2 years ago at Drawn to Scale and Riot
 ➢ Manage/Deploy dozens of servers.

 ➢ Learning Ruby is a “joy” of its own.


Ansible: 6 months ago at Palomino
 ➢ Manage/Deploy dozens of servers.

 ➢ First Palomino Cluster Tool subset built.
Prerequisite: Build a Large Cluster
         Configuration Management Options
Pick your Configuration Management:
 ➢ Chef: Popular, use Ruby to “code your

   infrastructure.” Must learn Ruby.
 ➢ Puppet: Mature, use data structures to “define

   your infrastructure.” Less coding.
 ➢ Ansible: Tiny and modular, similar to Puppet,

   but with ordering for deployment. Pragmatic.
Write/Get Recipes, Manifests, Playbooks?
 ➢ Writing is tedious. Can take >1 week.

 ➢ Get from internet? Often incomplete.
Prerequisite: Build a Large Cluster
               The Palomino Cluster Tool
Palomino's tool for building large DB clusters:
 ➢ Chef, Puppet, Ansible modules.

 ➢ Open-source on Github.

     ➢   https://github.com/time-palominodb/PalominoClusterTool
     ➢   Google: “Palomino Cluster Tool.”
➢   Will build a large cluster for you in hours:
     ➢ Master(s)
     ➢ Slaves – hundreds of them as easy as two.


     ➢ MHA – when master fails, a slave takes over.


➢   Previously this would take days.
The Palomino Cluster Tool
         Building the Management Node
Cluster Management Node:
➢ Will build the initial cluster.

➢ Will do subsequent cluster management.




Tool for Initial Cluster Build:
 ➢ Palomino Cluster Tool (Ansible subset).




Tool for Cluster Management:
 ➢ Jetpants (Ruby).
The Palomino Cluster Tool
           Building the Management Node
Palomino Cluster Tool (Ansible subset).

Why Ansible?
➢ No server to set up, simply uses SSH.

➢ Easy-to-understand non-code Playbooks.

➢ Use a language you know for modules.

➢ For demo purposes, obvious choice.

➢ Also production-worthy:

   ➢   Built by Michael DeHaan, long-time
       configuration management guru.
The Palomino Cluster Tool
          Building the Management Node
Management node lives alongside your cluster.
➢ We are building our cluster in EC2.

➢ Thus management node in EC2.

➢ This tutorial assumes Ubuntu 12.04.

➢ t1.micro is fine for management node.




Install basic tools:
 ➢ apt-get install git (for Ansible/P.C.T.)

 ➢ apt-get install make python-jinja2 (for

   Ansible)
The Palomino Cluster Tool
         Configuring the Management Node
Install Ansible:
 ➢ git clone git://github.com/ansible/ansible.git

 ➢ make install




Install Palomino Cluster Tool:
 ➢ git clone git://github.com/time-

   palominodb/PalominoClusterTool.git

I think we just finished the management node!
The Palomino Cluster Tool
         Allocating Shard Nodes
Shard nodes:
 ➢ m1.small or larger: at least 1.6GB RAM,

 ➢ :3306, :80, and :22 open between all (one

   security group in EC2),
 ➢ Ubuntu 12.04 (other Debian-alikes at your

   own risk – but may work!).

Do not need OS/database configuration:
➢ Ansible will configure them.
The Palomino Cluster Tool
            Building the First Shard – Step 1
  From README: edit IP addresses in cluster
  layout file (PalominoClusterToolLayout.ini):
# Alerting/Trending -----
[alertmaster]
10.252.157.110
[trendmaster]
10.252.157.110

# Servers -----
[mhamanager]
10.252.157.110


  This section identical for all Shards.
The Palomino Cluster Tool
            Building the First Shard – Step 2
  From README: edit IP addresses in cluster
  layout file (PalominoClusterToolLayout.ini):
[mysqlmasters]
10.244.17.6

[mysqlslaves]
10.244.26.199
10.244.18.178

[mysqls:vars]
master_host=10.244.17.6


  This section different for every Shard.
The Palomino Cluster Tool
            Building the First Shard – Step 3
  Run setup command to put configuration and
  SSH keys into /etc:
$ cd PalominoClusterTool/AnsiblePlaybooks/Ubuntu-12.04
$ ./00-Setup_PalominoClusterTool.sh ShardA


  Run build command – it's a wrapper around
  Ansible Playbooks:
$ ./10-MySQL_MHA_Manager.sh ShardA
The Palomino Cluster Tool
            Building the Second Shard
  Just make one shard with a master and many
  slaves. In real life, you might do something like
  this instead:
for i in ShardB ShardC ShardD ; do
  (manual step):
  vim PalominoClusterToolLayout.ini
  (scriptable steps):
  ./00-Setup_PalominoClusterTool.sh $i
  ./10-MySQL_MHA_Manager.sh $i
done


  Run them in separate terminals to save time.
Make the Cluster Real
              Data makes Shard Split Interesting
    Fill ShardA using random data script.*

    Palomino Cluster Tool includes such a tool.
     ➢ HelperScripts/makeGiantDatafile.pl



$   ssh root@sharda-master
#   cd PalominoClusterTool/HelperScripts
#   mysql -e 'create database palomino'
#   ./makeGiantDatafile.pl 1200000 3 | mysql -f palomino


    Install Jetpants, do shard split now.
    * Be sure /var/lib/mysql is on large partition!
Administering the Cluster
              Install Jetpants
    General idea: Install Ruby >=1.9.2 and
    RubyGems, then Jetpants via RubyGems.

    On my systems, /etc/alternatives always
    incorrect, ln the proper binaries for Jetpants.
#   apt-get install ruby1.9.3 rubygems libmysqlclient-dev
#   ln -sf /usr/bin/ruby1.9.3 /etc/alternatives/ruby
#   ln -sf /usr/bin/gem1.9.3 /etc/alternatives/gem
#   gem install jetpants
Administering the Cluster
              Configure Jetpants
    General idea: edit /etc/jetpants.yaml and
    create/own Jetpants inventory and application
    configuration to Jetpants user:
#   vim /etc/jetpants.yaml
#   mkdir -p /var/jetpants
#   touch /var/jetpants/assets.json
#   chown jetpantsusr: /var/jetpants/assets.json
#   mkdir -p /var/www
#   touch /var/www/databases.yaml
#   chown jetpantsusr: /var/www/databases.yaml
Administering the Cluster
              Jetpants Shard Splits
  Tell Jetpants Console about your ShardA:
Jetpants> s = Shard.new(1, 999999999, '10.12.34.56',
:ready) #10.12.34.56==ShardA master
Jetpants> s.sync_configuration


  Create spares within Console for all others
  (improved workflow in Jetpants 0.7.8):
Jetpants>   topology.tracker.spares << '10.23.45.67'
Jetpants>   topology.tracker.spares << '10.23.45.68'
Jetpants>   topology.tracker.spares << '10.23.45.69'
Jetpants>   topology.write_config
Jetpants>   topology.update_tracker_data
Administering the Cluster
           Jetpants Shard Splits
Just for this tutorial:
 ➢ Create the “palomino” database,

 ➢ Break the replication on all the spares,

 ➢ Be sure spares are read/write:

     ➢ Edit my.cnf,
     ➢ service mysql restart


➢   Ensure “jetpants pools” proper:
     ➢ One master,
     ➢ Two slaves.
Administering the Cluster
            Jetpants Shard Splits
  How to perform an actual Shard Split:
$ jetpants shard_split --min-id=1 --max-id=999999999


  Notes:
  ➢ Process takes hours. Use screen or nohup.

  ➢ LeftID == parent's first, RightID == parent's

    last, no overlap/gap.
  ➢ Make children 1-300000,300001-999999999.
Jetpants Shard Splitting
                         The Gory Details
       After “jetpants shard_split”:
ubuntu@ip-10-252-157-110:~$ jetpants pools
shard-1-999999999 [3GB]
master          = 10.244.136.107 ip-10-244-136-107
standby slave 1 = 10.244.143.195 ip-10-244-143-195
standby slave 2 = 10.244.31.91 ip-10-244-31-91
shard-1-400000 (state: replicating) [2GB]
master          = 10.244.144.183 ip-10-244-144-183
shard-400001-999999999 (state: replicating) [1GB]
master          = 10.244.146.27 ip-10-244-146-27

   0   global pools
   3   shard pools
----   --------------
   3   total pools

   3   masters
   0   active slaves
   2   standby slaves
   0   backup slaves
----   --------------
   5   total nodes
Jetpants Improvements
         The Result of an Experiment
Jetpants only well-tested on RHEL/CentOS.

Palomino Cluster Tool only well-tested to build
Ubuntu 12.04 clusters.

Little effort to fix Jetpants:
 ➢ /sbin/service location different,

 ➢ service mysql status output different.
Jetpants Improvements
         The Result of an Experiment
Jetpants only well-tested on MySQL 5.1.

I built a cluster of MySQL 5.5.

A little more effort to fix Jetpants:
➢ Set master_host=' ' is syntax error,

➢ reset slave needs keyword “all” appended.
Jetpants Improvements
         The Result of an Experiment
Jetpants only well-tested on large datasets.

I built a cluster with only hundreds of MB.

A wee tad more effort to fix Jetpants:
➢ Some timings assumed large datasets,

➢ Edge cases for small/quick operations

  reported back to the author.
Jetpants Improvements
         OSS Collaboration and Win
Evan Elias implemented these fixes last week!
 ➢ jetpants add_pool,

 ➢ jetpants add_shard,

 ➢ jetpants add_spare (with sanity-check spare),

 ➢ Shards with 1 slave (not for prod!),

 ➢ read_only spares not fatal,

 ➢ Debian-alike (Ubuntu) fixes,

 ➢ MySQL 5.5 fixes,

 ➢ Mid-split Jetpants pools output simpler.


Really responsive ownership of project!
Twitter's Gizzard
             What is it?
General Framework for distributed database.
➢ Hides sharding from you.

➢ Literally, it is middleware.

    ➢ Applications connect to Gizzard,
    ➢ Gizzard sends connections to proper place,


    ➢ Shard splits and hardware failure taken care of.


➢ Created at Twitter by rogue cowboys.
➢ Not completely production-ready.

    ➢   Better than rolling your own!
Twitter's Gizzard
         Why should I use it?
You've settled on row-based partition scheme:
 ➢ Master nearing I/O capacity, won't scale up,

 ➢ Can't move some tables to their own pool,

 ➢ Can't split the columns/indexes out,

 ➢ You want to keep using the DBMS you

   already know and love: Percona Server.*
 ➢ Don't want to think about fault-tolerance or

   shard splits (much),

* Actually use any storage back-end.
Twitter's Gizzard
         The Fine Print
This sounds perfect. Why not Gizzard?

Writes must follow strict diet. Must be:
➢ Idempotent*,

➢ Commutative**,

➢ Must not have tuberculosis.




* Pfizer cannot remove the idempotency
requirement of Gizzard.
** Even on evenings and weekends.
Twitter's Gizzard
         Expanding the Fine Print
Idempotency:
 ➢ Submit a write. Again. And again.

 ➢ Must be identical to doing it once.

 ➢ Bad: “update set col = col + 1”




Commutative – writes in arbitrary order:
➢ WriteA→WriteB→WriteC on Node1.

➢ WriteB→WriteC→WriteA on Node2.

➢ Bad: “update set col1 = 42”→“update set

  col2 = col1 + 5”
Twitter's Gizzard
         Expanding the Fine Print
Cluster is Eventually Consistent:
➢ May return old values for reads.

➢ Unknown when consistency will occur.




Like a politician's position on the budget:
 ➢ Might be consistent in the future.

 ➢ Just not right now.

 ➢ Or now.
Twitter's Gizzard
           Working Around the Shortcomings
Gizzard work-around:
➢ Add timestamp to every transaction.

➢ Good:

     ➢ “col1.ts=1; update set col1=42” →
     ➢ “update set col2=col1 + 5 where col1.ts=1”


➢   Implementation trickier if DBMS doesn't
    support column attributes.

Cannot escape: must radically re-think schema
and application/DBMS interaction.
Twitter's Gizzard
             Trying it Out
I'm convinced! How do I begin?
 ➢ Learn Scala.

 ➢ Clone “rowz” from Github.

    ➢   https://github.com/twitter/Rowz
➢ Modify it to suit your needs.
➢ Learn how it interacts with existing tools.

➢ Write new monitoring/alerting plugins.

➢ Write unit tests!

➢ You should OSS it to help with overhead.
Twitter's Gizzard
          Trying it Out
Sounds daunting. Maybe I'll roll my own?

Learn from others' mistakes:
 ➢ Digg: 2 engineers 6 months. Code thrown

   away. Digg out of business.
 ➢ Countless identical stories in Silicon Valley.




NIHS attitude == Go out of business*.

* 8-figure R&D budgets excepted.
Youtube's Vitess/Vtocc
         What is it?
Vitess is a library. Vtocc is an implemenation
using it.

Vtocc is another middleware solution.
➢ Sharding,

➢ Caching,

➢ Connection-pooling,

➢ In-use at Youtube,

➢ Built-in fail-safe features.
Youtube's Vtocc
         Why use it?
Proven high-volume sharding solution.

Interesting feature-list:
 ➢ Auto query/transaction over-limit killing.

 ➢ Better query-cache implementation.

 ➢ Query comment-stripping for query cache.

 ➢ Query consolidation.

 ➢ Zero downtime restarts.




Less coding than Gizzard (more plug-in).
Youtube's Vtocc
         Hold on, Zero Downtime Restarts?
Just start new Vtocc instance.
 ➢ Instance1 passes new requests to Instance2,

 ➢ Instance1's connections get 30s to complete,

 ➢ Instance2 kills Instance1 and takes over.




                    Vtocc Instance 1




                            Vtocc Instance 2
Youtube's Vtocc
          The Fine Print
Requires Particular Primary Keys:
➢ varbinary datatype,

➢ Choose carefully to prevent hot-spots.




Max result-set size: larger resultsets fail.

Additional administration burden:
➢ “My query was killed. Why?”

➢ Middleware adds spooky hard-to-diagnose

  failure modes.
Youtube's Vtocc
                 Implementation Details
➢   Run Vtocc on same server as MySQL.
➢   Configure Vtocc fail-safes for expected load:
    ➢ Pool Size (connection count),

    ➢ Max Transactions (has own connection pool),

    ➢ Query Timeout (before killed),

    ➢ Transaction Timeout (before killed),

    ➢ Max Resultset Size in rows

        ➢   Go language doesn't free allocated memory, so
            pick this value carefully.
➢   More details: http://code.google.com/p/vitess/wiki/Operations
HAproxy
        Re-thinking Proxy Topology
Old-school Proxy Topology:
➢ DB Clients one one side,

➢ DB Servers on the other,

➢ Proxy in-between.




                 Single Point of Failure
HAproxy
         Re-thinking Proxy Topology
Free proxy provides new architecture option:
 ➢ Proxy on every DB client node.

 ➢ Good-bye single-point-of-failure.

 ➢ Hello configuration management for proxy.



                  HAproxy



             HAproxy



                  HAproxy



             HAproxy



                  HAproxy
Methods of Sharding MySQL
         Q&A
Questions? Suggestions:
➢ Interesting stuff. Got a job for me?

➢ Well I got a job for you. Interested?

➢ Warn me next time so I can sleep in the back

  row.
➢ Was that a question?




Thank you! Emails to domain palominodb,
username time. Percona Live 2012 in New York
City. Enjoy the rest of the show!

Contenu connexe

Tendances

MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
Abdul Manaf
 
Mysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp : 20-June-14 : Mysql FabricMysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp
 
Mysql User Camp : 20th June - Mysql New Features
Mysql User Camp : 20th June - Mysql New FeaturesMysql User Camp : 20th June - Mysql New Features
Mysql User Camp : 20th June - Mysql New Features
Tarique Saleem
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
Morgan Tocker
 

Tendances (20)

MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
 
Mysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp : 20-June-14 : Mysql FabricMysql User Camp : 20-June-14 : Mysql Fabric
Mysql User Camp : 20-June-14 : Mysql Fabric
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
 
Webinar slides: Introduction to Database Proxies (for MySQL)
Webinar slides: Introduction to Database Proxies (for MySQL)Webinar slides: Introduction to Database Proxies (for MySQL)
Webinar slides: Introduction to Database Proxies (for MySQL)
 
Building Scalable High Availability Systems using MySQL Fabric
Building Scalable High Availability Systems using MySQL FabricBuilding Scalable High Availability Systems using MySQL Fabric
Building Scalable High Availability Systems using MySQL Fabric
 
MySQL Fabric Tutorial, October 2014
MySQL Fabric Tutorial, October 2014MySQL Fabric Tutorial, October 2014
MySQL Fabric Tutorial, October 2014
 
Running Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft AzureRunning Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft Azure
 
MySQL Group Replication - an Overview
MySQL Group Replication - an OverviewMySQL Group Replication - an Overview
MySQL Group Replication - an Overview
 
Built-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIsBuilt-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIs
 
NoSQL in MySQL
NoSQL in MySQLNoSQL in MySQL
NoSQL in MySQL
 
Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011Choosing a MySQL High Availability solution - Percona Live UK 2011
Choosing a MySQL High Availability solution - Percona Live UK 2011
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
Introduction to Galera
Introduction to GaleraIntroduction to Galera
Introduction to Galera
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 
Mysql User Camp : 20th June - Mysql New Features
Mysql User Camp : 20th June - Mysql New FeaturesMysql User Camp : 20th June - Mysql New Features
Mysql User Camp : 20th June - Mysql New Features
 
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
 
Tips to drive maria db cluster performance for nextcloud
Tips to drive maria db cluster performance for nextcloudTips to drive maria db cluster performance for nextcloud
Tips to drive maria db cluster performance for nextcloud
 
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyMySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
 

En vedette

MySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architecturesMySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architectures
FromDual GmbH
 
High Availability with MySQL
High Availability with MySQLHigh Availability with MySQL
High Availability with MySQL
Thava Alagu
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance Tuning
FromDual GmbH
 

En vedette (20)

MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
 
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
 
Sharding using MySQL and PHP
Sharding using MySQL and PHPSharding using MySQL and PHP
Sharding using MySQL and PHP
 
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
 
사례를 통해 알아보는 IoT 분석 플랫폼 요건
사례를 통해 알아보는 IoT 분석 플랫폼 요건사례를 통해 알아보는 IoT 분석 플랫폼 요건
사례를 통해 알아보는 IoT 분석 플랫폼 요건
 
Getting Started with PL/Proxy
Getting Started with PL/ProxyGetting Started with PL/Proxy
Getting Started with PL/Proxy
 
MySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architecturesMySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architectures
 
MySQL Proxy: Architecture and concepts of misuse
MySQL Proxy: Architecture and concepts of misuseMySQL Proxy: Architecture and concepts of misuse
MySQL Proxy: Architecture and concepts of misuse
 
MySQL Fabric: High Availability using Python/Connector
MySQL Fabric: High Availability using Python/ConnectorMySQL Fabric: High Availability using Python/Connector
MySQL Fabric: High Availability using Python/Connector
 
High Availability with MySQL
High Availability with MySQLHigh Availability with MySQL
High Availability with MySQL
 
MySQL Performance Tuning
MySQL Performance TuningMySQL Performance Tuning
MySQL Performance Tuning
 
MySQL highav Availability
MySQL highav AvailabilityMySQL highav Availability
MySQL highav Availability
 
MySQL Proxy. From Architecture to Implementation
MySQL Proxy. From Architecture to ImplementationMySQL Proxy. From Architecture to Implementation
MySQL Proxy. From Architecture to Implementation
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL Cluster
 
MySQL Proxy tutorial
MySQL Proxy tutorialMySQL Proxy tutorial
MySQL Proxy tutorial
 
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
 
Inside PyMongo - MongoNYC
Inside PyMongo - MongoNYCInside PyMongo - MongoNYC
Inside PyMongo - MongoNYC
 
MySQL HA Solutions
MySQL HA SolutionsMySQL HA Solutions
MySQL HA Solutions
 
MySQL Proxy. A powerful, flexible MySQL toolbox.
MySQL Proxy. A powerful, flexible MySQL toolbox.MySQL Proxy. A powerful, flexible MySQL toolbox.
MySQL Proxy. A powerful, flexible MySQL toolbox.
 

Similaire à Methods of Sharding MySQL

Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
guest18a0f1
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
Ross Lawley
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 

Similaire à Methods of Sharding MySQL (20)

Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the Seams
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Common Challenges in DevOps Change Management
Common Challenges in DevOps Change ManagementCommon Challenges in DevOps Change Management
Common Challenges in DevOps Change Management
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
Scaling symfony apps
Scaling symfony appsScaling symfony apps
Scaling symfony apps
 
Where do I put this data? #lessql
Where do I put this data? #lessqlWhere do I put this data? #lessql
Where do I put this data? #lessql
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
 
Intro to Node.js (v1)
Intro to Node.js (v1)Intro to Node.js (v1)
Intro to Node.js (v1)
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
The Highs and Lows of Stateful Containers
The Highs and Lows of Stateful ContainersThe Highs and Lows of Stateful Containers
The Highs and Lows of Stateful Containers
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Lessons From A DevOps Transformation on AWS
Lessons From A DevOps Transformation on AWSLessons From A DevOps Transformation on AWS
Lessons From A DevOps Transformation on AWS
 
Hive at booking
Hive at bookingHive at booking
Hive at booking
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Blades for HPTC
Blades for HPTCBlades for HPTC
Blades for HPTC
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 

Plus de Laine Campbell

An Introduction To Palomino
An Introduction To PalominoAn Introduction To Palomino
An Introduction To Palomino
Laine Campbell
 
Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouse
Laine Campbell
 
CouchConf SF 2012 Lightning Talk - Operational Excellence
CouchConf SF 2012 Lightning Talk - Operational ExcellenceCouchConf SF 2012 Lightning Talk - Operational Excellence
CouchConf SF 2012 Lightning Talk - Operational Excellence
Laine Campbell
 

Plus de Laine Campbell (10)

Recruiting for diversity in tech
Recruiting for diversity in techRecruiting for diversity in tech
Recruiting for diversity in tech
 
Database engineering
Database engineeringDatabase engineering
Database engineering
 
Velocity pythian operational visibility
Velocity pythian operational visibilityVelocity pythian operational visibility
Velocity pythian operational visibility
 
Pythian operational visibility
Pythian operational visibilityPythian operational visibility
Pythian operational visibility
 
RDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and PatternsRDS for MySQL, No BS Operations and Patterns
RDS for MySQL, No BS Operations and Patterns
 
Running MySQL in AWS
Running MySQL in AWSRunning MySQL in AWS
Running MySQL in AWS
 
An Introduction To Palomino
An Introduction To PalominoAn Introduction To Palomino
An Introduction To Palomino
 
Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouse
 
CouchConf SF 2012 Lightning Talk - Operational Excellence
CouchConf SF 2012 Lightning Talk - Operational ExcellenceCouchConf SF 2012 Lightning Talk - Operational Excellence
CouchConf SF 2012 Lightning Talk - Operational Excellence
 
Understanding MySQL Performance through Benchmarking
Understanding MySQL Performance through BenchmarkingUnderstanding MySQL Performance through Benchmarking
Understanding MySQL Performance through Benchmarking
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Methods of Sharding MySQL

  • 1. Methods of Sharding MySQL Percona Live NYC 2012 Who are Palomino? Bespoke Services: we work with and like you. Production Experienced: senior DBAs, admins, and engineers. 24x7: globally-distributed on-call staff. Short-term no-lock-in contracts. Professional Services (DevOps): ➢ Chef, ➢ Puppet, ➢ Ansible. Big Data Cluster Administration (OpsDev): ➢ MySQL, PostgreSQL, ➢ Cassandra, HBase, ➢ MongoDB, Couchbase.
  • 2. Methods of Sharding MySQL Percona Live NYC 2012 Who am I? Tim Ellis CTO/Principal Architect, Palomino Achievements: ➢ Palomino Big Data Strategy. ➢ Datawarehouse Cluster at Riot Games. ➢ Back-end Storage Architecture for Firefox Sync. ➢ Led DB teams at Digg for four years. ➢ Harassed the Reddit team at one of their parties. Ensured Successful Business for: ➢ Digg, Friendster, ➢ Riot Games, ➢ Mozilla, ➢ StumbleUpon.
  • 3. Methods of Sharding MySQL What is this Talk? Large cluster admin: when one DB isn't enough. ➢ What is a shard? ➢ What shard types can I choose? ➢ How to build a large DB cluster. ➢ How to administer that giant mess of DBs. Types of large clusters: ➢ Just a bunch of databases. ➢ Distributed database across machines.
  • 4. Methods of Sharding MySQL Where the Focus will Lie 12% – Sharding theory/considerations. 25% – Building a Cluster to administer (tutorial): ➢ Palomino Cluster Tool. 50% – Flexible large-cluster administration (tutorial): ➢ Tumblr's Jetpants. 13% – Other sharding technologies (talk-only): ➢ Youtube's Vtocc (Vitess), ➢ Twitter's Gizzard, ➢ HAproxy.
  • 5. Methods of Sharding MySQL What about the Silver Bullets? NoSQL Distributed Databases: ➢ Promise “sharding” for free, ➢ Uptime and horizontal scaling trivially. Reality: ➢ RDBMS is 40-yr-old tech, ➢ NoSQL is 10-yr-old tech, ➢ Which responsible for how many high-profile downtimes in the past 10 years? ➢ Evaluate the alternatives without illusions.
  • 6. Methods of Sharding MySQL What is a Shard? A location for a subset of data: ➢ Itself made of pieces. ➢ Typically itself redundant. Shard for User Data Shard for Logging Data Shard for Posts Data Master Master Master Slave Slave Slave Slave Slave Slave Slave Slave Slave
  • 7. Methods of Sharding MySQL What are the Sharding Method Choices? By-Function: ➢ Move busy tables onto new shard. ➢ Writes of busiest tables on new hardware. ➢ Writes of remaining tables on current. By-Columns: ➢ Split table into chunks of related columns, store each set on its own Master/Slaves shard. By-Rows: ➢ A table is split into N shards, shard gets a subset of the rows of the table.
  • 8. Methods of Sharding MySQL Shard Method Choices By-function and By-Column Methods: ➢ Much easier. ➢ Can get you through months to years. ➢ Eventually you run out of options here. By-Row Method: ➢ The hardest to do. ➢ Requires new ways of accessing data. ➢ Often requires sophisticated cache strategies. ➢ Itself can be done several ways.
  • 9. Methods of Sharding MySQL By-Function Sharding Picking a Functional Split: ➢ A subset of tables commonly joined. ➢ Tables outside this subset nearly never joined. ➢ One of them responsible for many writes. Every table outside subset requires rewriting JOINs into code-based multi-SELECTs. Once subset of tables moved onto their own server, writes are distributed.
  • 10. Methods of Sharding MySQL By-Column Sharding (Vertical Partition) Identifying candidate table: ➢ Many columns (“users” anyone?), ➢ Many updates, ➢ Many indexes. Required: even split of columns/indexes by update frequency. Attempt: logical grouping. JOINs not possible nor desireable: write multi- SELECT code in application DAL.
  • 11. Methods of Sharding MySQL Row-based Sharding Choices Range-based Sharding: ➢ Easy to understand. ➢ Each shard gets a range of rows. ➢ Oft-times some shards are “hot.” ➢ Hot shards are split into separate shards. ➢ Cold shards are joined into a single shard. ➢ Juggling shard load is a frequent process. Typically the best solution. Shortcomings have known work-arounds.
  • 12. Methods of Sharding MySQL Row-based Sharding Choices Modulus/Hash-based Sharding: ➢ Row key is hashed to integer modulo number of shards, then placed on that shard. ➢ Only rarely are some shards are “hot.” ➢ Shard splitting is difficult to implement. Also a common method of sharding. We hope not to split shards often (or ever). When we do, it's a multi-week process.
  • 13. Methods of Sharding MySQL Row-based Sharding Choices Lookup Table-based Sharding: ➢ Easy to understand. ➢ Row key mapped to shard in a lookup table. ➢ Easy to move load off hot shards. ➢ Lookup table method is problematic: ➢ Single point of failure. ➢ Performance bottleneck. ➢ Billions of rows, itself may need sharding.
  • 14. Prerequisite: Build a Large Cluster Allocating the Hardware Getting Hardware – your own company's: ➢ Can be politically-charged. ➢ Get a small batch first. ➢ Build small demonstration cluster. ➢ Get everyone on-board with the demo. Renting/Leasing Hardware – the Cloud: ➢ Allocate hardware in EC2 or elsewhere. ➢ Usually easier, but possibly harder admin: ➢ Hardware failure more common. ➢ Hardware/network flakiness more common.
  • 15. Prerequisite: Build a Large Cluster Building the Cluster Okay, I've got the hardware. What next?
  • 16. Prerequisite: Build a Large Cluster Building the Cluster Configuring the Hardware. The old dilemma: ➢ Spend days to install/configure DB software? Subsequent management is painful. ➢ Use SSH in “for” loops? Rolling your own configuration management tools is a lot of work. ➢ Learn a configuration management tool? Obvious choice in 2012. Well-documented tools like Chef, Puppet, Ansible.
  • 17. Configuration Management Tools My Experience Puppet: 6 years ago at Digg ➢ Manage/Deploy of hundreds of servers. ➢ Painful, but not as bad as hand-coding it all. Chef: 2 years ago at Drawn to Scale and Riot ➢ Manage/Deploy dozens of servers. ➢ Learning Ruby is a “joy” of its own. Ansible: 6 months ago at Palomino ➢ Manage/Deploy dozens of servers. ➢ First Palomino Cluster Tool subset built.
  • 18. Prerequisite: Build a Large Cluster Configuration Management Options Pick your Configuration Management: ➢ Chef: Popular, use Ruby to “code your infrastructure.” Must learn Ruby. ➢ Puppet: Mature, use data structures to “define your infrastructure.” Less coding. ➢ Ansible: Tiny and modular, similar to Puppet, but with ordering for deployment. Pragmatic. Write/Get Recipes, Manifests, Playbooks? ➢ Writing is tedious. Can take >1 week. ➢ Get from internet? Often incomplete.
  • 19. Prerequisite: Build a Large Cluster The Palomino Cluster Tool Palomino's tool for building large DB clusters: ➢ Chef, Puppet, Ansible modules. ➢ Open-source on Github. ➢ https://github.com/time-palominodb/PalominoClusterTool ➢ Google: “Palomino Cluster Tool.” ➢ Will build a large cluster for you in hours: ➢ Master(s) ➢ Slaves – hundreds of them as easy as two. ➢ MHA – when master fails, a slave takes over. ➢ Previously this would take days.
  • 20. The Palomino Cluster Tool Building the Management Node Cluster Management Node: ➢ Will build the initial cluster. ➢ Will do subsequent cluster management. Tool for Initial Cluster Build: ➢ Palomino Cluster Tool (Ansible subset). Tool for Cluster Management: ➢ Jetpants (Ruby).
  • 21. The Palomino Cluster Tool Building the Management Node Palomino Cluster Tool (Ansible subset). Why Ansible? ➢ No server to set up, simply uses SSH. ➢ Easy-to-understand non-code Playbooks. ➢ Use a language you know for modules. ➢ For demo purposes, obvious choice. ➢ Also production-worthy: ➢ Built by Michael DeHaan, long-time configuration management guru.
  • 22. The Palomino Cluster Tool Building the Management Node Management node lives alongside your cluster. ➢ We are building our cluster in EC2. ➢ Thus management node in EC2. ➢ This tutorial assumes Ubuntu 12.04. ➢ t1.micro is fine for management node. Install basic tools: ➢ apt-get install git (for Ansible/P.C.T.) ➢ apt-get install make python-jinja2 (for Ansible)
  • 23. The Palomino Cluster Tool Configuring the Management Node Install Ansible: ➢ git clone git://github.com/ansible/ansible.git ➢ make install Install Palomino Cluster Tool: ➢ git clone git://github.com/time- palominodb/PalominoClusterTool.git I think we just finished the management node!
  • 24. The Palomino Cluster Tool Allocating Shard Nodes Shard nodes: ➢ m1.small or larger: at least 1.6GB RAM, ➢ :3306, :80, and :22 open between all (one security group in EC2), ➢ Ubuntu 12.04 (other Debian-alikes at your own risk – but may work!). Do not need OS/database configuration: ➢ Ansible will configure them.
  • 25. The Palomino Cluster Tool Building the First Shard – Step 1 From README: edit IP addresses in cluster layout file (PalominoClusterToolLayout.ini): # Alerting/Trending ----- [alertmaster] 10.252.157.110 [trendmaster] 10.252.157.110 # Servers ----- [mhamanager] 10.252.157.110 This section identical for all Shards.
  • 26. The Palomino Cluster Tool Building the First Shard – Step 2 From README: edit IP addresses in cluster layout file (PalominoClusterToolLayout.ini): [mysqlmasters] 10.244.17.6 [mysqlslaves] 10.244.26.199 10.244.18.178 [mysqls:vars] master_host=10.244.17.6 This section different for every Shard.
  • 27. The Palomino Cluster Tool Building the First Shard – Step 3 Run setup command to put configuration and SSH keys into /etc: $ cd PalominoClusterTool/AnsiblePlaybooks/Ubuntu-12.04 $ ./00-Setup_PalominoClusterTool.sh ShardA Run build command – it's a wrapper around Ansible Playbooks: $ ./10-MySQL_MHA_Manager.sh ShardA
  • 28. The Palomino Cluster Tool Building the Second Shard Just make one shard with a master and many slaves. In real life, you might do something like this instead: for i in ShardB ShardC ShardD ; do (manual step): vim PalominoClusterToolLayout.ini (scriptable steps): ./00-Setup_PalominoClusterTool.sh $i ./10-MySQL_MHA_Manager.sh $i done Run them in separate terminals to save time.
  • 29. Make the Cluster Real Data makes Shard Split Interesting Fill ShardA using random data script.* Palomino Cluster Tool includes such a tool. ➢ HelperScripts/makeGiantDatafile.pl $ ssh root@sharda-master # cd PalominoClusterTool/HelperScripts # mysql -e 'create database palomino' # ./makeGiantDatafile.pl 1200000 3 | mysql -f palomino Install Jetpants, do shard split now. * Be sure /var/lib/mysql is on large partition!
  • 30. Administering the Cluster Install Jetpants General idea: Install Ruby >=1.9.2 and RubyGems, then Jetpants via RubyGems. On my systems, /etc/alternatives always incorrect, ln the proper binaries for Jetpants. # apt-get install ruby1.9.3 rubygems libmysqlclient-dev # ln -sf /usr/bin/ruby1.9.3 /etc/alternatives/ruby # ln -sf /usr/bin/gem1.9.3 /etc/alternatives/gem # gem install jetpants
  • 31. Administering the Cluster Configure Jetpants General idea: edit /etc/jetpants.yaml and create/own Jetpants inventory and application configuration to Jetpants user: # vim /etc/jetpants.yaml # mkdir -p /var/jetpants # touch /var/jetpants/assets.json # chown jetpantsusr: /var/jetpants/assets.json # mkdir -p /var/www # touch /var/www/databases.yaml # chown jetpantsusr: /var/www/databases.yaml
  • 32. Administering the Cluster Jetpants Shard Splits Tell Jetpants Console about your ShardA: Jetpants> s = Shard.new(1, 999999999, '10.12.34.56', :ready) #10.12.34.56==ShardA master Jetpants> s.sync_configuration Create spares within Console for all others (improved workflow in Jetpants 0.7.8): Jetpants> topology.tracker.spares << '10.23.45.67' Jetpants> topology.tracker.spares << '10.23.45.68' Jetpants> topology.tracker.spares << '10.23.45.69' Jetpants> topology.write_config Jetpants> topology.update_tracker_data
  • 33. Administering the Cluster Jetpants Shard Splits Just for this tutorial: ➢ Create the “palomino” database, ➢ Break the replication on all the spares, ➢ Be sure spares are read/write: ➢ Edit my.cnf, ➢ service mysql restart ➢ Ensure “jetpants pools” proper: ➢ One master, ➢ Two slaves.
  • 34. Administering the Cluster Jetpants Shard Splits How to perform an actual Shard Split: $ jetpants shard_split --min-id=1 --max-id=999999999 Notes: ➢ Process takes hours. Use screen or nohup. ➢ LeftID == parent's first, RightID == parent's last, no overlap/gap. ➢ Make children 1-300000,300001-999999999.
  • 35. Jetpants Shard Splitting The Gory Details After “jetpants shard_split”: ubuntu@ip-10-252-157-110:~$ jetpants pools shard-1-999999999 [3GB] master = 10.244.136.107 ip-10-244-136-107 standby slave 1 = 10.244.143.195 ip-10-244-143-195 standby slave 2 = 10.244.31.91 ip-10-244-31-91 shard-1-400000 (state: replicating) [2GB] master = 10.244.144.183 ip-10-244-144-183 shard-400001-999999999 (state: replicating) [1GB] master = 10.244.146.27 ip-10-244-146-27 0 global pools 3 shard pools ---- -------------- 3 total pools 3 masters 0 active slaves 2 standby slaves 0 backup slaves ---- -------------- 5 total nodes
  • 36. Jetpants Improvements The Result of an Experiment Jetpants only well-tested on RHEL/CentOS. Palomino Cluster Tool only well-tested to build Ubuntu 12.04 clusters. Little effort to fix Jetpants: ➢ /sbin/service location different, ➢ service mysql status output different.
  • 37. Jetpants Improvements The Result of an Experiment Jetpants only well-tested on MySQL 5.1. I built a cluster of MySQL 5.5. A little more effort to fix Jetpants: ➢ Set master_host=' ' is syntax error, ➢ reset slave needs keyword “all” appended.
  • 38. Jetpants Improvements The Result of an Experiment Jetpants only well-tested on large datasets. I built a cluster with only hundreds of MB. A wee tad more effort to fix Jetpants: ➢ Some timings assumed large datasets, ➢ Edge cases for small/quick operations reported back to the author.
  • 39. Jetpants Improvements OSS Collaboration and Win Evan Elias implemented these fixes last week! ➢ jetpants add_pool, ➢ jetpants add_shard, ➢ jetpants add_spare (with sanity-check spare), ➢ Shards with 1 slave (not for prod!), ➢ read_only spares not fatal, ➢ Debian-alike (Ubuntu) fixes, ➢ MySQL 5.5 fixes, ➢ Mid-split Jetpants pools output simpler. Really responsive ownership of project!
  • 40. Twitter's Gizzard What is it? General Framework for distributed database. ➢ Hides sharding from you. ➢ Literally, it is middleware. ➢ Applications connect to Gizzard, ➢ Gizzard sends connections to proper place, ➢ Shard splits and hardware failure taken care of. ➢ Created at Twitter by rogue cowboys. ➢ Not completely production-ready. ➢ Better than rolling your own!
  • 41. Twitter's Gizzard Why should I use it? You've settled on row-based partition scheme: ➢ Master nearing I/O capacity, won't scale up, ➢ Can't move some tables to their own pool, ➢ Can't split the columns/indexes out, ➢ You want to keep using the DBMS you already know and love: Percona Server.* ➢ Don't want to think about fault-tolerance or shard splits (much), * Actually use any storage back-end.
  • 42. Twitter's Gizzard The Fine Print This sounds perfect. Why not Gizzard? Writes must follow strict diet. Must be: ➢ Idempotent*, ➢ Commutative**, ➢ Must not have tuberculosis. * Pfizer cannot remove the idempotency requirement of Gizzard. ** Even on evenings and weekends.
  • 43. Twitter's Gizzard Expanding the Fine Print Idempotency: ➢ Submit a write. Again. And again. ➢ Must be identical to doing it once. ➢ Bad: “update set col = col + 1” Commutative – writes in arbitrary order: ➢ WriteA→WriteB→WriteC on Node1. ➢ WriteB→WriteC→WriteA on Node2. ➢ Bad: “update set col1 = 42”→“update set col2 = col1 + 5”
  • 44. Twitter's Gizzard Expanding the Fine Print Cluster is Eventually Consistent: ➢ May return old values for reads. ➢ Unknown when consistency will occur. Like a politician's position on the budget: ➢ Might be consistent in the future. ➢ Just not right now. ➢ Or now.
  • 45. Twitter's Gizzard Working Around the Shortcomings Gizzard work-around: ➢ Add timestamp to every transaction. ➢ Good: ➢ “col1.ts=1; update set col1=42” → ➢ “update set col2=col1 + 5 where col1.ts=1” ➢ Implementation trickier if DBMS doesn't support column attributes. Cannot escape: must radically re-think schema and application/DBMS interaction.
  • 46. Twitter's Gizzard Trying it Out I'm convinced! How do I begin? ➢ Learn Scala. ➢ Clone “rowz” from Github. ➢ https://github.com/twitter/Rowz ➢ Modify it to suit your needs. ➢ Learn how it interacts with existing tools. ➢ Write new monitoring/alerting plugins. ➢ Write unit tests! ➢ You should OSS it to help with overhead.
  • 47. Twitter's Gizzard Trying it Out Sounds daunting. Maybe I'll roll my own? Learn from others' mistakes: ➢ Digg: 2 engineers 6 months. Code thrown away. Digg out of business. ➢ Countless identical stories in Silicon Valley. NIHS attitude == Go out of business*. * 8-figure R&D budgets excepted.
  • 48. Youtube's Vitess/Vtocc What is it? Vitess is a library. Vtocc is an implemenation using it. Vtocc is another middleware solution. ➢ Sharding, ➢ Caching, ➢ Connection-pooling, ➢ In-use at Youtube, ➢ Built-in fail-safe features.
  • 49. Youtube's Vtocc Why use it? Proven high-volume sharding solution. Interesting feature-list: ➢ Auto query/transaction over-limit killing. ➢ Better query-cache implementation. ➢ Query comment-stripping for query cache. ➢ Query consolidation. ➢ Zero downtime restarts. Less coding than Gizzard (more plug-in).
  • 50. Youtube's Vtocc Hold on, Zero Downtime Restarts? Just start new Vtocc instance. ➢ Instance1 passes new requests to Instance2, ➢ Instance1's connections get 30s to complete, ➢ Instance2 kills Instance1 and takes over. Vtocc Instance 1 Vtocc Instance 2
  • 51. Youtube's Vtocc The Fine Print Requires Particular Primary Keys: ➢ varbinary datatype, ➢ Choose carefully to prevent hot-spots. Max result-set size: larger resultsets fail. Additional administration burden: ➢ “My query was killed. Why?” ➢ Middleware adds spooky hard-to-diagnose failure modes.
  • 52. Youtube's Vtocc Implementation Details ➢ Run Vtocc on same server as MySQL. ➢ Configure Vtocc fail-safes for expected load: ➢ Pool Size (connection count), ➢ Max Transactions (has own connection pool), ➢ Query Timeout (before killed), ➢ Transaction Timeout (before killed), ➢ Max Resultset Size in rows ➢ Go language doesn't free allocated memory, so pick this value carefully. ➢ More details: http://code.google.com/p/vitess/wiki/Operations
  • 53. HAproxy Re-thinking Proxy Topology Old-school Proxy Topology: ➢ DB Clients one one side, ➢ DB Servers on the other, ➢ Proxy in-between. Single Point of Failure
  • 54. HAproxy Re-thinking Proxy Topology Free proxy provides new architecture option: ➢ Proxy on every DB client node. ➢ Good-bye single-point-of-failure. ➢ Hello configuration management for proxy. HAproxy HAproxy HAproxy HAproxy HAproxy
  • 55. Methods of Sharding MySQL Q&A Questions? Suggestions: ➢ Interesting stuff. Got a job for me? ➢ Well I got a job for you. Interested? ➢ Warn me next time so I can sleep in the back row. ➢ Was that a question? Thank you! Emails to domain palominodb, username time. Percona Live 2012 in New York City. Enjoy the rest of the show!